Noise resistance of next-generation reservoir computing: a comparative study with high-order correlation computation

Reservoir computing (RC) methods have received more and more attention and applications in chaotic time series prediction with their simple structure and training method. Recently, the next-generation reservoir computing (NG-RC) method has been proposed by Gauthier et al. (Nat Commun 12:5564, 2021) with less training cost and better time series predictions. Nevertheless, in practice, available data on dynamic systems are contaminated with noise. Though NG-RC is shown highly efficient in learning and predicting, its noise resistance captivity is not clear yet, limiting its use in practical problems. In this paper, we study the noise resistance of the NG-RC method, taking the well-known denoising method, the high-order correlation computation (HOCC) method, as a reference. Both methods have similar procedures in respect of function bases and regression processes. With the simple ridge regression method, the NG-RC method has a strong noise resistance for white noise, even better than the HOCC method. Besides, the NG-RC method also shows a good prediction ability for small colored noise, while it does not provide correct reconstruct dynamics. In this paper, other than reconstruction parameters, four numerical indicators are used to check the noise resistance comprehensively, such as the training error, prediction error, prediction time, and auto-correlation prediction error, for both the short-time series and long climate predictions. Our results provide a systematic estimation of NG-RC’s noise resistance capacity, which is helpful for its applications in practical problems.


Introduction
Analysis and prediction of data play an important role in people's production and life, such as weather prediction, environmental pollution control, earthquake prediction, financial data analysis, speech recognition, image processing, and aircraft control [1][2][3][4][5][6][7][8][9][10]. For the field of time series prediction, various Supplementary Information The online version contains supplementary material available at https://doi.org/10.1007/ s11071-023-08592- 7. methods have been proposed and obtained more satisfactory prediction results, such as system dynamics reconstruction and neural network prediction. However, in various applications, all the prediction methods are facing a central problem-noise, which brings great difficulties for prediction. For example, when extracting valuable signals from the collected data, the correlation between the noises affects the estimation performance of the signal parameters seriously [11]. In radar monitoring, the presence of noise can make the dynamic reconstruction results unstable [12]. In remote sensing, the effect of noise on the prediction of system dynamics can lead to biased detection of spatiotemporal concentration distribution information [13]. Therefore, the ability to accurately obtain system dynamics from noisy data becomes one of the essential indicators of prediction methods.
To overcome the effects caused by noise in the data, researchers have proposed a variety of dynamical reconstruction methods, such as smoothing method [14], polynomial fitting method [15], local dynamics global fitting method [16], and high-order correlation computation (HOCC) method [17]. The smoothing method takes multi-step averaging to attenuate the noise effect. The polynomial fitting method proposed by Lu et al. [15] directly uses polynomials to fit the time series with noise. The local dynamics global fitting method proposed by Wang and Lan et al. [16] uses globally invariant polynomials to fill all the local pieces of time series. The HOCC method proposed by Chen et al. uses the differential-time correlation of variables to filter out the noise and solves all the unknown coefficients in the system equation by calculating the high-order correlation between variables. In HOCC, the use of differential-time correlation to remove the effect of noise can adjust the time difference in a considerable range to adapt to different noise conditions.
In the last decades, lots of new methods have been proposed to learn, predict and reconstruct the dynamics from data, such as different network structures detection methods [18,19], extra local driving for topology inference [20], and the compressive sensing technology [21,22]. Among all these new methods, neural networks have also attracted much attention and applications in the field of prediction due to their good nonlinear mapping capability, self-learning adaptation ability, and parallel information processing capability [23][24][25][26][27]. At the beginning of this century, a novel recurrent neural network method, reservoir computing (RC), was proposed to bring a breakthrough to chaotic forecasting with its simple structure and training method [28][29][30][31][32][33][34]. Recently, the nextgeneration reservoir computing (NG-RC) method, proposed by Gauthier et al. [35] simplifies the RC computation system, significantly reducing its demand on computer resources and saving a lot of time. The new method creates linear and nonlinear feature vectors directly from discretely sampled input data without using a neural network. In NG-RC, the linear feature vector consists of constant terms and observations of the input vector at the current and certain previous time steps. The nonlinear feature vector consists of a two-by-two combination of linear components. The NG-RC method is 33-162 times faster than traditional RC calculations and requires only 28 neurons to achieve the accuracy that would have been achieved with 4000 neurons. Besides, the new method uses 400 data points to obtain the same results as the traditional RC using 5000 or even more data points for training (the exact number of data points depends on the required accuracy).
The high efficiency of the NG-RC methods attracts lots of attention. However, as one of the learning and prediction methods, the noise resistance of NG-RC is not discussed in detail yet, limiting its applications in practice where the available data is always contaminated with noise. Different from classical reservoir computing methods, the NG-RC method depends on the feature vectors, which is similar to the function bases of some reconstruction methods, such as HOCC. As one of the well-developed noise-resistant methods, the effectiveness, robustness, and adaptability of HOCC to different conditions have also been studied comprehensively in simulation validation [17,36,37]. Hence, to study the noise resistance of NG-RC comprehensively, we take the HOCC as a reference and compare the differences between these two methods and their noise resistance ability. Surprisingly, we find that even though the NG-RC method does not have special designs for noise resistance, it surpasses the HOCC method for systems with white noise and provides reasonable prediction ability for small colored noise.
In this paper, we compare the noise resistance ability and characteristics of NG-RC with reference to HOCC methods from theoretical analysis and numerical experiments. In terms of theory, we analyze the similarities between the two methods. Both methods have similar procedures in the respect of function bases and regression processes, especially when one considers the NG-RC method without time-delay function bases. We take the Lorenz system as an example and explore the difference in the coefficient reconstructed by the two methods. In terms of numerical experiments, four indicators are introduced to show the noise resistance of methods, such as the training error, prediction error, prediction time, and auto-correlation prediction error. Both the white and colored noise are considered in this paper. The effects of the noise intensity, training length, and sampling interval on the noise resistance ability of the two methods are explored comprehensively. Besides, the noise resistance to colored noise with the change of time delay in NG-RC and HOCC is also studied to study the potential variations of the method for colored noise.
This paper is organized as follows: Section 2 introduces NG-RC and HOCC methods and compares their difference theoretically. Sections 3 and 4 show the numerical experimental results of the NG-RC method without and with time-delay function bases on the white and colored noise-driven system. Section 5 concludes the paper.

Theoretical comparative analysis of NG-RC and HOCC
Considering the dynamics of an arbitrary noise-driven system as: _ xðtÞ ¼ f ðxðtÞÞ þ CðtÞ; where N denotes the time series length, and Dt is the time sampling interval. In this paper, we consider the inverse problem that given the time series in Eq. (2), how to obtain the original dynamical system Eq. (1). Here, we take the Lorenz system [5] as the example of an original noisedriven dynamic system. It is one of the most famous models in chaos studies, developed in 1963 as the first system discovered to produce chaotic attractors. The Lorenz system follows the dynamics of Eq. (1) with three coupled nonlinear differential equations: where x, y, z denote the state variables. In this paper, we take the parameters r ¼ 10; q ¼ 28; b ¼ 8=3 to for chaotic time series. In the following, we use NG-RC and HOCC methods for the inverse problems with noise to get the original Lorenz dynamical systems.

NG-RC method
The NG-RC method is developed from the traditional RC. It is no longer requiring a linear combination of the input signals using a randomly generated neural network to obtain the output signal, but instead directly creates feature vectors with discretely sampled input data, where the feature vectors are called bases. For prediction, the bases consist of three parts: constant term, linear terms, and nonlinear terms. For the Lorenz system, the linear terms at the moment t k are usually composed of xðt k Þ; yðt k Þ; zðt k Þ; xðt kÀ1 Þ; yðt kÀ1 Þ; zðt kÀ1 Þ [35], where k ¼ 2; . . .; N À 1. The nonlinear terms component is composed of two combinations of constant and linear terms, totaling 21 terms. The whole basis function matrix P is obtained from the observed data x, where the k-th column P k for time t k is: P k ¼ ð1; xðt k Þ; yðt k Þ; zðt k Þ; xðt kÀ1 Þ; yðt kÀ1 Þ; zðt kÀ1 Þ; x 2 ðt k Þ; xðt k Þyðt k Þ; xðt k Þzðt k Þ; xðt k Þxðt kÀ1 Þ; xðt k Þyðt kÀ1 Þ; xðt k Þzðt kÀ1 Þ; yðt k Þyðt kÀ1 Þ; yðt k Þzðt kÀ1 Þ; z 2 ðt k Þ; y 2 ðt k Þ; yðt k Þzðt k Þ; yðt k Þxðt kÀ1 Þ; zðt k Þxðt kÀ1 Þ; zðt k Þyðt kÀ1 Þ; zðt k Þzðt kÀ1 Þ; x 2 ðt kÀ1 Þ; xðt kÀ1 Þyðt kÀ1 Þ; xðt kÀ1 Þzðt kÀ1 Þ; The basic assumption of NG-RC is that the function base is complete for the original dynamical systems, from which the dynamical function f ðxÞ in Eq. (1) could be expressed as a linear transformation of P as where A 1 is the coefficient matrix. The left hand side of Eq.(5) can be estimated through a simple difference method as: Taking X k as the k-th column, one gets the target matrix X whose size is 3 Â L, where L ¼ N À 2 is the length of training data. Correspondingly, the size of coefficient the matrix A 1 and basis function matrix P are 3 Â 28 and 28 Â L respectively. In NG-RC method, the ridge regression method is used to solve A 1 : where P T is matrix transpose of P, a ridge regression parameter, and I the identity matrix. After solving the reserve problem and getting A 1 , an Euler-like integration step can be used to obtain the prediction time series y: with i ¼ 1; 2; . . .; 1. For learning and prediction of chaotic time series, the NG-RC method using bases with time delay is shown to have the same or even better prediction capacity compared to the traditional RC using randomly generated neural networks [35]. But here we focus on its noise resistance ability.

HOCC method
In this paper, we take the HOCC method as the reference, which uses the differential-time correlations of variables to remove noise and solves all unknown coefficients in the system equation by calculating the high-order correlations among the variables. When using the HOCC method for prediction, it is necessary to select an appropriate function basis in advance, denoted as Q as not to be confused with the function basis for NG-RC. For the Lorenz system, the polynomial functions are often chosen as the bases. Similar to the NG-RC method, the whole basis function matrix Q is obtained from the observed data x, where the k-th column Q k for time t k as: It is worth to be noted that the basis function matrix Q is exactly a part of the function matrix P except for the time-delay terms. The basic assumption of HOCC is also the same as NG-RC, where the function base Q is complete for the original dynamical systems, from which the dynamical function f ðxÞ in Eq. (1) could be expressed as a linear transformation of Q as where A 2 denotes the coefficient matrix of Q. Up to this step, the two methods, NG-RC and HOCC, are exactly the same, with only a slightly difference in the choice of function basis. If we use the ridge regression method to get A 2 directly, this is the NG-RC method without time-delay terms. However, in the HOCC method, one more step is applied to remove the noise.
Here, one takes the function vector Q T ðxðt À sÞÞ with time delay as and right multiple it with Eq. (1). Then by taking time averaging to calculate all related correlations, we have with BðÀsÞ ¼ ðB 1 ðÀsÞ; B 2 ðÀsÞ; . . .; B 10 ðÀsÞÞ T ; where l ¼ 1; 2; . . .; 10, \ Á [ denotes time averaging. The time delay s is assumed to satisfy the inequality 0 % s d ( s ( 1, that it is much larger than the correlation time of dynamical noise s d , and much smaller than the characteristic times of deterministic network dynamics, previously assumed to be of order 1. In this case, noises and correlations are decorrelated as since the fast-changing noise must not be correlated with any variable data of previous times, disregarding any forms of colored noises. Now with the noisedecorrelation of Eqs. (14) and (12) can be reduced to which leads to which could be solved directly. This differential-time correlation-based procedure to get A 2 is the essential part of the HOCC method and also the major difference between the NG-RC method. After getting the coefficient matrix A 2 , one can reconstruct the original dynamic is as and also iterate the system and obtain the predicted time series y similar to NG-RC. It can be seen that both the NG-RC and HOCC methods use the same idea to fit the original system with function bases that are selected in advance and then solve its coefficient matrix. If both methods choose the same bases, the target matrix for its optimization is the same. The difference is that in the HOCC method, theĈ in Eq. (15) is composed of the differential-time correlations of the basis vectors, and the coefficient matrix A 2 is regression solved after the correlations, so that the noise is removed by using the property that the noise and the correlators must be decorrelated. While in the NG-RC method, the matrix Y is directly combined by the basis vectors. The ridge regression method is used directly to solve the coefficient matrix A 1 , and by adjusting a to remove noises while preventing over-fitting.
It is straightforward to notice that these two methods are comparable in theory. In the following, we will use numerical methods to check how well the simple ridge regression method used in NG-RC is for the noise resistance, compared to the HOCC method which has a special designed time difference correlations procedure for noise elimination.

The NG-RC method without time-delay function bases
In this paper, the Lorenz system in Eq. (3) is set with r ¼ 10, q ¼ 28, b ¼ 8=3(the Lorenz system is chaotic with this parameter), and the sampling interval reads h ¼ 0:0002. The data of the original Lorenz system is generated by the iteration of the Runge-Kutta method, which is noted as y Lorenz . Two types of noise are considered, as the Gaussian white noise and Ornstein-Uhlenbeck colored noise. To show the noise effect, we increase the noise strength gradually, from 0.0001 to 25. The colored noise correlation time is fixed at 25 h. The phase diagrams with either white or colored noise are shown in Fig. 1. It is clear that noises affect the dynamics of the Lorenz system dramatically in Fig. 1. How to get the original system through such data is the challenging problem that we want to solve with NG-RC and HOCC methods.
In Section 2, we know that if the same function bases are selected, i.e., the NG-RC method without time-delay function bases, then the goal matrix of the two methods is the same. But NG-RC and HOCC have different procedures for noise resistance. Hence, in the following, we compare NG-RC and HOCC in two steps. In this section, we only focus on the NG-RC method without time-delay function bases, denoted as NG-RC10 for it only considering the 10 function bases. And in the next section, we will use the full bases introduced in NG-RC.

White noise-driven system
First, consider the white noise-driven Lorenz system. With HOCC and NG-RC methods, the dynamical system is reconstructed. Compared with the original Lorenz system, We obtain reconstruction errors of these two methods and show them in Table 1.
When the D is smaller, the standard deviation of the coefficients corresponding to the Lorenz system obtained by the HOCC method is about 10 À5 to 10 À6 , and the other terms are about 10 À2 . When the D is large, the standard deviation of coefficient terms is about 10 À3 , and other terms are about 10 À2 . It can be found that the reconstruction result of NG-RC10 is close to that of the HOCC method. HOCC method has only slightly better noise resistance ability than NG-RC10 for white noise cases. Here, the train steps L and sampling interval h are chosen as L ¼ 15 million and h ¼ 0:0002, which are chosen as one of the best parameters for the HOCC method, studied in [37].
To further explore the processing ability of the two methods for noisy data with different parameters, and better understand their noise resistance characteristics, we adopted the control variable method to analyze the noise resistance performance of the NG-RC and HOCC methods from the perspective of the influence of train steps L, sampling interval h and noise strength D on the predicted results respectively. Instead of the reconstructed parameters, here we use a single numerical indicator introduced in [35], E t , to estimate the reconstruction result, which reads as mean square error between reconstructed time series y and the original one y Lorenz . The smaller the E t is, the better the noise resistance ability of the method. The experimental results of E t changes with the above three variables are shown in Fig. 2. Here, considering that the ridge regression parameter a of the NG-RC method has a certain influence on its noise resistance ability, for each data point in Fig. 2, a was optimized to obtain the optimal E t for display. The optimization process is in Supplementary Note 1.
When the sampling interval is sufficiently small, as shown in Fig. 2a, fixing h ¼ 0:00002 and choosing four different intensities of white noise to drive the system for training, the E t of the two methods do not differ much when the amount of training data is small, and both decrease as the number of train steps increases. These results coincide with the one obtained from the reconstructed parameters shown in Table 1 that the NG-RC10 method and HOCC method have almost the same noise resistance in these cases.
However, as shown in Fig. 2b, increasing the sampling interval to h ¼ 0:001, the reconstruction error E t of the HOCC method appears to be a ''plateau area'' when D is small, while E t continues decreasing with the increase of L for NG-RC10 method. The simpler NG-RC method works even better than HOCC for time series with large sampling intervals and sufficient large train steps. More details can be found in Fig. 2c, where the training data are the same and large enough (15 million), and the white noise-driven system with three different sampling intervals is selected for training. The reconstruction error E t of both methods increases with the increase of D, while E t of the HOCC method appears ''plateau area'' at h ¼ 0:001 and h ¼ 0:0002. Combined with Fig. 2b, the HOCC method can no longer decrease E t when training and predicting data with minor D and greater h, no matter how to reduce D or increase the number of train steps, and this lower limit varies with h. The larger the h, the higher the lower limit.
The above results are corroborated in Fig. 2d. For NG-RC10, E t is more affected by D, but the variation of E t with h is smaller for the same D system. For the HOCC method, when h is small, the smaller D is, the smaller E t is, similar to the NG-RC10 method. But for large sample interval h, the difference between E t corresponding to different D decreases until it overlaps when h [ 0:001. In these cases, a decrease in the noise strength D does not result in the improvement of prediction capacity. The sampling interval defines a minimum value of E t , increasing with the increase of h. As a result, for time series with large sampling intervals, the NG-RC10 method works better than HOCC, as shown in Fig. 3. Here, the a is also optimized. When h ¼ 0:00002, the noise resistance ability of the two methods is comparable in Fig. 3a. When increasing h to 0.0002 and 0.001, the E t of the HOCC methods becomes larger for smaller intensity noise, while the results of NG-RC10 are less affected, as shown in Fig. 3b, c. The appearance of the ''plateau area'' of the HOCC method is an open question and out of the scope of this paper, but from this phenomenon, the NG-RC10 method could provide a better prediction for the dynamical system with white noise.
In the above results, the prediction error E t is used to describe the noise resistance ability of the method. Meanwhile, we noted that other indicators were also used in previous studies to measure the reconstruction or prediction ability of the method. For example, in [16,17], the results obtained by the reconstruction method are compared with the coefficients of the original dynamic system to measure its reconstruction ability. In [35], the mean square error of the prediction time sequence obtained by the noise resistance method Table 1 When the white noise-driven system is reconstructed, the standard deviation of the coefficients obtained by the two methods and the mean of the difference correspond to the original Lorenz system and the training data was calculated to measure the method's fitting ability to the training data. In [38], the prediction ability of the method was measured by the coincident steps between the prediction sequence and the original noise-free sequence. To understand the two methods in multiple dimensions, we check the noise resistance ability of the two methods in terms of a more refined four-dimensional representation. Other than the prediction error E t , we consider the training error E n , prediction time T p , and auto-correlation prediction error E a .
(1) The training error E n , the mean square error of y and y train , is defined as It is the objective function of the training phase and is used to measure the learning error with noise.
(2) The prediction time T p , the coincident steps of y and y Lorenz , is defined as the maximum time step k where the square error between the predicted time series and original Lorenz system is smaller than 0.01.
It is important to note here that T p is correlated with E t . A Smaller prediction error E t results in a larger prediction time T p . But for different dynamical systems, T p also depends on the system's maximum Lyapunov exponent, where the Lyapunov time is usually used as the unit of T p .
(3) Auto-correlation prediction error E a is defined as the time series auto-correlation coefficient prediction error between y and y Lorenz . To measure the difference between the predicted time series and the original Lorenz system time series long-time evolution behavior of the two methods. The auto-correlation coefficients of the time series r k ðk ¼ 1; 2; . . .; NÞ are calculated using the following equation: Here fx t g represents a set of time series, T represents the length of the time series, and fx tþk g represents the time series after delaying fx t g by k steps, leading to The training and prediction results of the two methods using the above four indexes for the white noisedriven system are shown in Fig. 4, where the two methods use the same data set for training, and the size of the training data is selected to be 15 million. It can be seen that the two methods have very different performances in the four indicators. For training error E n , there two methods have similar performance, satisfying E n ¼ 10 À5 approximately as shown in Fig. 4a, showing that both methods has reached the optimized result as they designed to get. However, such optimized results from these two methods are different, shown as by the prediction error E t in Fig. 4b, especially when D is small. Here E t of NG-RC10 is smaller than that of the HOCC method, the dynamical system from NG-RC10 is closer to the original Lorenz system than HOCC. This fact results in a longer prediction time T p and smaller autocorrelation prediction error E a of NG-RC10 in Fig. 4c, d, describing the better prediction ability for the short and long evolution of the system.
Compared with the HOCC method, we can conclude that the NG-RC method with the same function bases has a better noise resistance and prediction ability for white noise. This is from the learning limit of HOCC by large sampling intervals. Studying the mechanism of HOCC is out of the scope of this paper. But for the NG-RC method, it is sufficient to conclude a positive statement for its noise resistance to white noise. We will consider the NG-RC method with timedelay terms in the next section and obtain the same results.

Colored noise-driven system
The spectrum of noise is an important feature. The HOCC method has developed a full frame to deal with colored noise and shows a good noise resistance ability [37]. Here we take the Ornstein-Uhlenbeck noise and check the noise resistance of the NG-RC10 method with colored noise.
Similar to the analysis of white noise, we first reconstruct the parameters and check the reconstruction errors of HOCC and NG-RC10. The results are shown in Table 2. No matter how small the noise strength D is, the reconstruction errors of NG-RC10 are much larger than HOCC. As the variations of the reconstructed parameters are kept sufficiently small, one can conclude that the reconstructed parameters by NG-RC10 have systematic bias from the original dynamical systems. This error is removed by HOCC through differential-time correlation methods.
The training and prediction results of the two methods using the four numerical indicators for the colored noise-driven system are shown in Fig. 5, where the two methods use the same data set for training, and the size of the training data is selected to be 15 million. The optimization process of a is shown in Supplementary Note 2. Similar to the white noise cases, E n of the HOCC method and NG-RC10 are almost the same, but for the prediction error, E t of the NG-RC10 method is always larger than E t of HOCC. Consequently, HOCC has better prediction ability when dealing with the prediction time T p and autocorrelation prediction error E a . However, it is interesting to note that when the noise strength is sufficiently small as D\0:01, the NG-RC10 and HOCC have almost the same numerical indicators, showing the same prediction capacity, even the reconstructed parameters shown in Table 2 from The power of HOCC to deal with colored noise is from the time delay s in the differential-time correlations. When s is much larger than the correlation length s d of the noise and much smaller than the characteristic time of the system dynamics, HOCC has a good noise resistance of colored noise [4,17]. As shown in Fig. 6, the prediction error E t decreases with the increase of s to a minimum value after s [ 100. As for the NG-RC10 method, its prediction error E t is the same as the HOCC with s ¼ 1.
The NG-RC10 methods without the differentialtime correlations could not reconstruct the original dynamics correctly, as it mixes the dynamics and noise correlations. However, if the noise strength is sufficiently small, the NG-RC10 method could give the same good prediction of the system's evolution as the ones from the HOCC method, which has good reconstructed parameters. The mechanism of this phenomenon is an open question, and out of the scope of this paper. We will study it in our following work with various types of colored noise.

The NG-RC method with time-delay function bases
In the above section, NG-RC10 shows a good effect in antiwhite noise prediction and even performs better than the HOCC method when the sampling interval h is large. However, in colored noise resistance prediction, its noise resistance ability is worse than HOCC, especially when the noise strength D is large.
In this section, we further consider the NG-RC method with all 28 function bases and explore the contributions of the additional time-delay terms. It is worth noting that there are two parameters in the method: k and s. The linear part of the function bases of the method consists of the input vector at the current and at k À 1 previous times steps spaced by s, where s À 1 is the number of skipped steps between consecutive observations. For the noiseless Lorenz system, k ¼ 2 and s ¼ 1 are often used. Increasing k means adding more time-delay bases to the linear terms. And the s is similar to the s in the HOCC method, increasing s is to train with data that skip more observations between consecutive observations. This section will optimize both parameters. Table 2 When the colored noise-driven system is reconstructed, the standard deviation of the coefficients obtained by the two methods and the mean of the difference correspond to the original Lorenz system

White noise-driven system
To optimize parameters k and s to obtain the optimal prediction results, enough data was used for training. The E t under different k and s are obtained by training 20 white noise-driven systems with different D. Figure 7 shows the result when D ¼ 0:1.
As shown in Fig. 7, E t increase with the increase of k and s, indicating that increasing the number of timedelay bases in the basis function and using data after skipping more observations between consecutive observations for training are counterproductive to the noise resistance ability of the NG-RC method. Experiments on all the white noise-driven systems with different D show similar results. The k and s that make E t reach the minimum are mostly k ¼ 2 and s ¼ 1, which are often used to predict the noiseless Lorenz system. With optimized parameters of k and s, the results of the four indexes are shown in Fig. 8. See Supplementary Note 3 for the optimization process of a. Relation between E t and s of NG-RC10 and HOCC method. The sampling interval is h ¼ 0:0002, and the noise correlation length is s d ¼ 25h  Figure 8 shows the prediction results of NG-RC10 and NG-RC after optimized parameter in the white noise-driven system. The difference between NG-RC10 and NG-RC is small in all four numerical indicators. The NG-RC method with time-delay function bases has no significant effect on the prediction performance of the white noise-driven system. Hence both the NG-RC10 and NG-RC methods have a good white noise resistance.

Colored noise-driven system
As for the colored noise, the k and s are still optimized. Figure 9 shows the change of E t with k and s when D ¼ 0:0001. For different k, when s ¼ 1, E t is small, and then with the increase of s, E t rapidly peaks, and then gradually decreases. The difference is that the E t when k ¼ 2 is always smaller than the E t corresponding to other k, and presents a U-shaped change with s, and its lowest point is close to the position where E t of HOCC method starts to stabilize when it changes with s in Fig. 6. It can be seen that the selection of appropriate k and s in the time-delay bases have a certain effect on the denoising of the NG-RC method. Figure 10 shows the results of four indexes of the NG-RC method optimized after k and s and the NG-RC method without time-delay bases. Each data point in the figure is the result of a optimization, see Supplementary Note 4 for details.
In Fig. 10a, E n of the NG-RC method with timedelay bases is very similar to that of the NG-RC method without time-delay bases, indicating that they have similar fitting effects on training data. In Fig. 10b, the NG-RC method with time-delay bases is smaller than that without time-delay bases when the D is relatively large or small. Similarly, for T p and E a in Fig. 10c, d, except for several data points with moderate D, the T p and E a of other data points of the NG-RC method with time-delay bases are smaller than those of NG-RC method without time-delay bases. It can be seen from the above results that the resistance ability to colored noise of the NG-RC method optimized with k, s, and a is improved compared with the NG-RC method without time-delay bases. However, compared with the HOCC method in Fig. 5, there is still a certain gap. Further improving the NG-RC's resistance ability to colored noise is an open question and a promising approach.

Conclusions
In this paper, the noise resistance ability of the NG-RC method is studied from the theoretical as well as numerical experimental, with reference to the HOCC method. Various aspects of these two methods are compared, such as the reconstruction error of parameters, training error E n , prediction error E t , prediction Fig. 9 Relationship between E t and k or s of the colored noisedriven system by NG-RC method with time-delay function bases. The parameters of the colored noise-driven Lorenz system are h ¼ 0:0002, D ¼ 0:0001, and s d ¼ 25h time T p , and the auto-correlation prediction error E a . The similarity of the NG-RC and HOCC is shown theoretically, from which we study the NG-RC method without time-delay function bases. Such NG-RC10 method share the same function bases with HOCC. With different procedures and algorithms, we find that the NG-RC10 method has a better noise resistance when dealing with white noise than HOCC. Even for colored noise, the NG-RC10 method also shows a good prediction power when the noise strength is small, comparable with HOCC, while it cannot provides the correct reconstructed parameters.
The NG-RC method with time-delay function bases is also discussed. Such additional time-delay terms are helpful in the noise resistance of this method. The NG-RC method with and without time-delay function bases has the same noise resistance capacity for white noise. But for the colored noise, the NG-RC method with time-delay function bases works better by optimizing k and s in some conditions.
Reservoir computing methods, including the NG-RC method we discussed in this paper, is a promising study field of inverse problems of the dynamical system. In this paper, we show the simple ridge regression method in reservoir computing has a relatively strong noise resistance for white noise. But how to improve its noise resistance for colored noise is still an open question and a promising research topic.
Author Contributions All authors contributed to the study conception and design. Material preparation, numerical simulation and analysis were performed by SL and JG. The first draft of the manuscript was written by SL and all authors commented on previous versions of the manuscript. All authors read and approved the final manuscript. Data Availability Enquiries about data availability should be directed to the authors.

Declarations
Conflict of interest The authors have no relevant financial or non-financial interests to disclose.

Open Access
This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.