1 Introduction

In the present scenario, prediction of Non-linear time series is highly momentous. For modelling of stationary and linear time series ARIMA model is best [1,2,3]. But to model non-linear time series ARIMA is not suitable since it uses only linearly dependent variables, [4]. In [5], Badr et al. obtained forecasts of web traffic in internet using Holt’s linear trend, BATS and TBATS models. Numerous non-linear time series prediction models are developed by many researchers. Some models deploying Artificial intelligence such as Genetic Algorithm, [6, 7], Artificial Neural Network, [8,9,10], etc can be seen in the literature. Mahmoudzadeh predicted Carbon monoxide emanations by deploying Imperialistic Competitive ANN, [11]. Models using ANFIS can also be seen such as model in [12] for ejection of carbon monoxide, in [13] for the coefficient of heat transfer prediction in purified water pool steaming, in [14] for the prediction of concentration of daily atmospheric carbon monoxide, etc. A NARX model is suggested by Rosadi et al. to compare seasonal time series prediction learning algorithms, [15]. To forecast non-linear and non-stationary data, some prediction models incorporating Empirical Mode Decomposition (EMD) can also be seen in the literature [16,17,18] etc. Huang et al. introduced EMD [19]. But the utilization of the models deploying Ensemble Empirical Mode Decomposition (EEMD) will minimize the dominant pitfall of EMD, mode mixing. Wu and Huang introduced EEMD, [20]. Some researchers constructed models assimilating EEMD such as Bao et al. [21], Xie et al. [22], Jiang et al. [23] etc. P Sameer and MC Lineesh suggested a hybrid model combining EEMD, SVD and LSTM to forecast atmospheric CO in the Indian region, [24] and extended the work by deploying CEEMDAN instead of EEMD, [25]. They also developed a hybrid model combining EEMD, SVD and Moving Average for the prediction of atmospheric CO in the Indian region [26]. CEEMDAN, developed by Torres et al. [27], reduces mode mixing and computational cost as far as EMD and EEMD are concerned. But CEEMDAN has some drawbacks like the appearance of residual noise in the modes and the existence of specious modes. ICEEMDAN, developed by Colominas et al. [28], can effectively reduce such drawbacks. As in the case of EMD, EEMD and CEEMDAN, ICEEMDAN is also aims to break up the data into several Intrinsic Mode Functions (IMFs). IMF is a function with the conditions: (i) the difference between zero crossings and extrema in the plot of the series is at most one, (ii) the upper and lower envelopes determined by extrema are at equal space from the axis, [19].

Neural networks are artificial networks that can mimic the activities of the human brain by using an input-output preprocessing method. Haykin [29] provides an extensive review of neural networks. By choosing appropriate weights and activation functions, neural networks can be utilized to predict time series. One of the earlier works in applying neural networks to non-linear time series prediction can be seen in [30] by Eric A. Wan. The Computational Intelligence (CI) techniques are more flexible than traditional statistical models like the ARIMA and make fewer or no prior assumptions about the input variables. Additionally, according to [31], CI techniques are better equipped to handle outliers, missing data, and noisy data. Hence, CI techniques are frequently employed to portray complex and non-linear relationships in high-dimensional scenarios. Being the most typical model among the CI methods, Artificial neural networks (ANNs) have a significant role to play in the study and forecasting of time series as in [32,33,34]. There are various types of deep learning architectures, including the recurrent neural network (RNN) [35, 36], convolutional neural network (CNN) [37, 38], etc, that take into account various input data features. In the literature, we can see some other types of networks such as the dynamic neural network for Nonconvex Portfolio Optimization [39], a nonlinear neural circuit for Bluetooth-aided mobile phone localization [40] and Distributed Recurrent Neural Networks for Cooperative Control of Manipulators [41]. In general, CNNs are not well-suited for capturing temporal information present in input data. As a result, RNNs have emerged as the dominant choice in research fields that involve sequential data. RNNs are unable to connect the pertinent data when there is a significant gap between the input data. Hochreiter and Schmidhuber [42] introduced long short-term memory (LSTM) to manage the "long-term dependencies". In contrast to conventional RNNs, the LSTM network has the capability to effectively learn time series data that span long durations and automatically identify the most suitable time intervals for making predictions. The majority of impressive outcomes in deep learning, specifically related to RNNs, have been accomplished through the utilization of LSTM . As a result, LSTM has emerged as the central point of interest within the field of deep learning. Over the last decade, the LSTM network has demonstrated successful applications in various domains such as speech recognition [43, 44], sentence embedding [45], acoustic modeling [46], and trajectory prediction [47].

A new time series prediction model is introduced in this study amalgamating ICEEMDAN, SVD and LSTM. ICEEMDAN decomposed the data into several IMF components along with a residue and the components are de-noised using SVD after converting them into Hankel matrices. A Hankel matrix is one having the same entries on the skew-diagonals [48]. A Hankel matrix of order \(l \times k\) can be expressed as

$$\begin{aligned} P=\left( \begin{array}{cccccc}q_1&{}q_2&{}q_3&{}.~~~~.~~~~. &{}q_k\\ q_2&{}q_3&{}q_4&{}.~~~~.~~~~. &{}q_{k+1}\\ q_3&{}q_4&{}q_5&{}.~~~~.~~~~. &{}q_{k+2}\\ .&{}.&{}.&{}&{}.\\ .&{}.&{}.&{}&{}.\\ .&{}.&{}.&{}&{}.\\ q_l&{}q_{l+1}&{}q_{l+2}&{}.~~~~.~~~~.&{}q_n \end{array} \right) , \end{aligned}$$

where \(l+k-1=n\). LSTM network is deployed for prediction.

2 Methodology

2.1 Singular Value Decomposition (SVD)

A \(p\times q\) matrix Q with real entries can be decomposed by SVD as:

$$\begin{aligned} Q = L~~D~~R^T \end{aligned}$$

such that \(L_{p\times r}\) and \(R_{q\times r}\) are orthogonal matrices. The right and left singular vectors are respectively given by the columns of R and L, [48]. The diagonal elements of the r-diagonal matrix

$$\begin{aligned} D=\left( \begin{array}{cc}S&{}0\\ 0&{}0 \end{array} \right) \end{aligned}$$

are called the singular values, where \(S = diag(\sigma _1,\ \sigma _2,\ .\ .\ .\ )\) with \(\sigma _1 \ge \sigma _2 \ge .\ .\ .\ > 0\).

The eigenvectors of \(Q^*Q\) and \(QQ^*\) are known as right and left singular vectors of Q respectively.

2.2 Empirical Mode Decomposition (EMD)

Huang et al. in 1998, suggested EMD [19]. Non-stationary and non-linear time series data can be analysed by EMD. The aim of EMD is to fragment the series into several IMF components along with a residue.

The steps of EMD are designed as:

a) Find each local extrema in the plot of series \(x_{t}\).

b) Construct lower envelope \(l_{m}\) by connecting all local minima and construct upper envelope \(l_{M}\) by connecting all local maxima.

c) Determine the mean \(a_{1_t}\) of \(l_{m}\) and \(l_{M}\),

i.e;   

$$\begin{aligned} a_{1_t}=~\displaystyle (l_{m}~+~l_{M})~/~2 \end{aligned}$$

d) Determine \(~i_{1_t}=~x_t~-~a_{1_t}\)

e) If the conditions for an IMF are satisfied by \(i_{1_t}\), allot \(i_{1_t}\) to the first IMF \(c_{1_t}\), and put the residue \(~r_{1_t}=~x_t~-~i_{1_t}\) in place of actual series. Otherwise, \(i_{1_t}\) replaces the actual series \(x_t\).

f) Repeat steps (a) to (e).

The following can be used as stopping criteria for this process:

(i) the residue is a monotone function so that further IMF components extraction is not possible.

(ii) the number of extrema and zero crossings are equal in two successive sifting steps.

By the above process original series \(x_t\) can be expressed as:

$$\begin{aligned} x_t=\sum _{j=1}^{n} \ {c_{j_t} +\ r_{n_t}}, \end{aligned}$$

where \(c_{j_t}\) represents the IMF components and \(r_{n_t}\) is the final residue which is a constant or a trend.

2.3 Ensemble Empirical Mode Decomposition (EEMD)

EEMD developed by Wu and Huang in 2009 [20] is assisted with added noise. The problem of mode mixing, the main pitfall of EMD, can be reduced by deploying EEMD.

The steps of EEMD are designed as:

a) Construct new data sets \(x^j_n\) by adding several white noise realizations \(w^j_n\ (j=1,\ 2,\ .\ .\ . ,\ N)\) to the original data \(x_n\)

i.e;

$$\begin{aligned} x^j_n=x_n+w^j_n \end{aligned}$$

b) Apply EMD to every \(x^j_n\ (j=1,\ 2,\ .\ .\ . ,\ N)\) to break into components \(IMF^j_k\), where \(k=\ 1,\ 2,\ .\ .\ . ,\ K\).

c) By finding the mean of \(IMF^j_k\) for \(j=1,\ 2,\ .\ .\ . ,\ N\), deduce \({\overline{IMF}}_k\), the k-th mode.

2.4 Complete Ensemble Empirical Mode Decomposition With Adaptive Noise (CEEMDAN)

CEEMDAN, an extension of EEMD, is designed by Torres et al. in 2011 [27]. To eliminate mode mixing and reduce computational cost CEEMDAN is efficient over EMD and EEMD. When compared to EEMD, CEEMDAN is having a small number of sifting iterations. In CEEMDAN, at each phase of decomposition adaptive white noise is added and calculates only residual signal.

The steps of CEEMDAN are designed as follows:

For the series \(x_n\), define r-th EMD mode of \(x_n\) by \(E_r(x_n)\). Let \(w^j\) be white noise.

a) Use EMD to decompose N realizations \(x_n+\epsilon _0 w^j_n,\ j=1,\ .\ .\ .,\ N,\) up to their first modes. Find out

$$\begin{aligned} {\widetilde{IMF}}_1=\frac{1}{N}\sum _{j=1}^{N}IMF_1^i={\overline{IMF}}_1 \end{aligned}$$

b) Find out the first residue as

$$\begin{aligned} r_1=x_n-{\widetilde{IMF}}_1 \end{aligned}$$

c) Apply EMD to the realizations \(r_1+\epsilon _1E_1( w^j_n),\ j=1,\ .\ .\ .,\ N\) up to their first modes. Find out the second mode as

$$\begin{aligned} {\widetilde{IMF}}_2=\frac{1}{N}\sum _{j=1}^{N}E_1(r_1+\epsilon _1E_1( w^j_n)) \end{aligned}$$

d) Using the equation

$$\begin{aligned} r_m=r_{m-1}-{\widetilde{IMF}}_m, \end{aligned}$$

calculate m-th residue

e) Use EMD to decompose the realizations \(~r_m~+~\epsilon _m~E_m( w^j_n), j=1,\ldots ,N\) and obtain their first modes. Find out (\(m+1\))-th mode as

$$\begin{aligned} {\widetilde{IMF}}_{m+1}=\frac{1}{N}\sum _{i=1}^{N}E_1(r_m+\epsilon _mE_m( w^j_n)) \end{aligned}$$

f) Visit step (d) for next m

Steps (d) to (f) are executed up to the state at which the further fragmentation of residue is not possible. Thus the endmost residue is:

$$\begin{aligned} R~=~x_n-\sum _{m=1}^{M}{\widetilde{IMF}}_m. \end{aligned}$$

So, \(x_n\) can be expressed as:

$$\begin{aligned} x_n=\sum _{m=1}^{M}{\widetilde{IMF}}_m+R. \end{aligned}$$

2.5 Improved Complete Ensemble Empirical Mode Decomposition with Adaptive Noise (ICEEMDAN)

ICEEMDAN, developed by Colominas et al. [28] in 2014, can deal with non-linear and non-stationary data. The drawbacks of CEEMDAN are the appearance of residual noise in the components and the existence of fictitious components. ICEEMDAN can effectively reduce the above drawbacks.

The steps of ICEEMDAN are designed as:

Let the data be x. The operator \(E_j(x)\) is defined as the j-th EMD mode of x and M(x) as the local mean of signal x. Let \(w^i\) be white noise and \(\left\langle \ \ .~ \ \right\rangle \) the operation of finding the average all over the realizations.

a) Use EMD to create I realizations \(x^{i}=~x~+~\beta _{0}E_1(w^{i})\) and find out their local mean to get first residue

$$\begin{aligned} r_1~=~\left\langle ~M(x^{i})~\right\rangle \end{aligned}$$

b) The first mode can be calculated as:

$$\begin{aligned} \widetilde{d_1}=~x~-~r_1 \end{aligned}$$

c) Averaging the local means of the realizations \(r_1+\beta _{1}E_2(w^{i})\) obtain the second residue and the second mode can be obtained by subtracting the first residue from it:

$$\begin{aligned}{} & {} r_2=\left\langle M(r_1~+~\beta _{1}E_2(w^{i})) \right\rangle \\ {}{} & {} \widetilde{d_2}=r_1-r_2=r_1-\left\langle M(r_1~+~\beta _{1}E_2(w^{i})) \right\rangle \end{aligned}$$

d) For \(k=3,~.~.~.,~K\) calculate k-th residue

$$\begin{aligned} r_k=\left\langle M(r_{k-1}~+~\beta _{k-1}E_k(w^{i})) \right\rangle \end{aligned}$$

e) The k-th mode can be obtained by

$$\begin{aligned} \widetilde{d_k}~=~r_{k-1}~-~r_k. \end{aligned}$$

f) Visit step (d) for next k.

2.6 SVD based Time series De-noising

Clean and noise parts of a time series can be separated by deploying SVD [49]. Let \(x_n\) be a signal such that

$$\begin{aligned} x_n ~=~ x_{n_s} ~+~ w_{n_w}, \end{aligned}$$

where \(x_{n_s}\) and \(w_{n_w}\) the clean part and white noise part of \(x_n\) respectively. In Hankel matrix representation, \(x_n\) can be expressed as:

$$\begin{aligned} Q~=~ \left( \begin{array}{cccccc}x_1~~&{}x_2~~&{}.~~~~~~~.~~~~~~~. &{}x_k\\ x_2~~&{}x_3~~&{}.~~~~~~~.~~~~~~~. &{}x_{k+1}\\ .~&{}.&{}&{}.\\ .&{}.&{}&{}.\\ .&{}.&{}&{}.\\ x_l~~&{}x_{l+1}~~&{}.~~~~~~~.~~~~~~~.&{}x_j \end{array} \right) . \end{aligned}$$

As \(x_n ~=~ x_{n_s} ~+~ w_{n_w}\), the Hankel depiction has the form

$$\begin{aligned} Q ~=~ Q_s ~+~ Q_{w}, \end{aligned}$$

where \(Q,\ Q_s\) and \( Q_{w}\) are the Hankel depiction of actual, clean and the white noise signal respectively. The matrix Q is decomposed by SVD as:

$$\begin{aligned} Q ~=~ L~D~R^T, \end{aligned}$$

where L and R are orthogonal matrices and D is a diagonal matrix with diagonal elements as singular values.

The SVD separates the data matrix into noise and clean parts. Since the singular vectors spans the data matrix, noise part corresponds to the singular values near zero.

The clean and noise subspace separation can be represented as:

$$\begin{aligned} Q ~=~ L~D~R^T=\left( \begin{array}{cc}L_1~&~L_2 \end{array} \right) \left( \begin{array}{cc}D_1~&{}~0\\ 0~&{}~D_2 \end{array} \right) \left( \begin{array}{cc}~R_1^T~\\ R_2^T \end{array} \right) \end{aligned}$$

then

$$\begin{aligned} Q ~=~ L_1~D_1~R_1^T~+~L_2~D_2~R_2^T, \end{aligned}$$

where the singular values corresponding to clean and noise subspaces are respectively the diagonal entries of \(D_1\) and \(D_2\). Hence we have

$$\begin{aligned} Q_s ~=~ L_1~D_1~R_1^T \end{aligned}$$

and

$$\begin{aligned} Q_{w}~=~L_2~D_2~R_2^T. \end{aligned}$$

We need to determine a threshold in D such that the singular values lower than the threshold value corresponds the noise subspace and set them as zeros. By plotting the singular values against their index, the threshold can be found out as a point at which the slope deviates drastically.

The combined application of Hankelization and SVD establishes a sophisticated denoising methodology. This two-step approach leverages the strengths of both techniques, resulting in an enhanced ability to discern meaningful information from noisy components in complex time series datasets.

2.7 Long Short Term Memory (LSTM) network

The LSTM network, characterized by its architecture designed to handle time-dependent targets and inputs, stands as a formidable tool in the realm of deep learning. Its inherent capability to analyze and predict time series data arises from its adeptness in resolving long-term dependency issues. At the heart of the LSTM network lies the memory cell, which serves as the nucleus, orchestrating the network’s ability to capture intricate temporal patterns. The foundational principles and critical aspects of an LSTM network, integral to understanding its robust functionality, are eloquently detailed in [50]. This comprehensive depiction not only solidifies the theoretical underpinnings but also highlights the practical significance of leveraging LSTM networks for time series analysis and prediction. The memory cell, acting as the cornerstone of the LSTM architecture, enables the network to effectively capture, retain, and utilize temporal information, contributing to its superior performance in handling dynamic and time-dependent datasets.

2.8 ICEEMDAN - SVD - LSTM Prediction Model

This study combined ICEEMDAN, SVD and LSTM network to propose a new time series prediction model. The main levels of proposed models are ICEEMDAN level, SVD level and LSTM level. Several number of IMF components and a residue produces as the output of ICEEMDAN in level 1. level 2 applies SVD on the hankel representations of the IMF components and residue to de-noise them. Using LSTM network, forecast of each de-noised component is constructed in level 3. The integration of LSTM prediction in level 3 serves as a powerful conclusion to our methodology, providing valuable insights into the potential future trajectory of the time series based on the denoised components. To obtain the forecast of original data, all the above series to be added. Figure 1 illustrates the design of ICEEMDAN - SVD - LSTM model. This multi-level approach, encompassing ICEEMDAN-SVD denoising followed by LSTM forecasting, contributes to a comprehensive framework for analyzing and predicting complex time series data. The curve plotting and programming are done by Matlab 9.10.0.

Fig. 1
figure 1

Step-by-step diagram of ICEEMDAN - SVD - LSTM Prediction Model

3 Application of Methodology

3.1 Data

Wolf’s Sunspot Numbers represent a historical record of solar activity based on the observation of sunspots. While the specific index has evolved over time, it remains a crucial dataset for understanding long-term solar behaviour and its potential impact on Earth. To analyse the proposed prediction model, Wolf’s Sunspot Numbers from the year 1700 to 1988 are used. The data is non-linear and is of size 289. Figure 2 gives the time series plot of the data.

Fig. 2
figure 2

Sunspot data (from the year 1700 to 1988)

3.2 Segregation of IMF Components

In the first level, ICEEMDAN is used to decompose the given data into several IMF components. Here, ICEEMDAN started with 500 realizations, each created by adding the original data series with white noise standard deviation times the first EMD mode of the white noise. Maximum number of sifting iterations is set to 10. SNRFlag is set to 1. That is, SNR (Signal-to-Noise Ratio) increases for every stage. As the result of ICEEMDAN, seven IMF components namely IMF1, IMF2,..., IMF7 and a residue are obtained. This decomposition allows us to extract and analyze the underlying oscillatory modes and patterns within the time series data. The decomposed components, IMF1 through IMF7, capture various frequency bands, with IMF1 representing the highest frequency and IMF7 the lowest. In Fig. 3, these components are visually presented in their separation order, providing a clear depiction of the signal’s frequency content from higher to lower frequencies. Notably, the final component of the decomposition represents the overall trend of the time series. This trend component encapsulates the long-term behaviour or underlying patterns present in the original data.

Fig. 3
figure 3

The Extricated IMF components of Sunspot data by ICEEMDAN

3.3 De-noising the IMF Components by SVD

To enhance the denoising efficacy in level 2, each of the IMF components and the residue, obtained from the ICEEMDAN in level 1, undergoes an individual transformation into Hankel matrices. This transformation is a pivotal preprocessing step, setting the stage for a more effective application of SVD to mitigate noise. The conversion to Hankel matrices optimally organizes the temporal structure of each component, providing a coherent representation for subsequent analysis. This step ensures that the inherent patterns and characteristics within each IMF component and the residue are appropriately preserved before the denoising process begins. Subsequently, SVD is applied to all Hankel matrices. SVD breaks down the complex temporal structures into singular values, vectors, and matrices. The key advantage lies in the ability to identify and distinguish between the dominant patterns contributing to the data and the noise components that may be present.

The crux of the denoising process lies in the judicious selection of singular values. By isolating the Hankel representations corresponding to non-zero singular values, we focus on the essential patterns while effectively filtering out the noise. The series obtained from the Hankel representations corresponding to non-zero singular values will be the part of the data which is noise-reduced, emphasizing the preservation of significant information. This refined dataset, purged of unwanted noise, serves as a foundation for more accurate and reliable analyses. It not only improves the interpretability of underlying patterns but also ensures that subsequent modeling or forecasting efforts are based on a more robust and cleaner representation of the original time series.

3.4 Forecasting by LSTM

In level 3, the LSTM network is used to predict future values of each de-noised series. Leveraging the capabilities of LSTM, known for its proficiency in capturing temporal dependencies, allows us to model and predict the evolving patterns within the denoised time series. To facilitate this predictive modeling, the denoised series generated through the ICEEMDAN-SVD process serve as input sequences for the LSTM network. By training the LSTM on the first 90 percent of each denoised series and evaluating its performance on the remaining 10 percent, we ensure a robust assessment of its predictive accuracy. The training and testing split of 90-10 allows the LSTM network to learn from the majority of the historical data while providing a rigorous evaluation of unseen data, thus gauging its generalization capability.

The architecture of the LSTM network utilized in this study consists of several key components tailored to capture temporal dependencies and patterns within the input data. The input layer accepts input sequences with a specified number of features. In our case, as the input data is univariate time series data representing sunspot numbers, the input layer is configured with a single feature. In our implementation of the LSTM network, we employed a single LSTM layer with 500 hidden units. The choice of the number of hidden units is based on experimentation and aims to balance model complexity and predictive performance. We utilized the tanh activation function for the LSTM layer. Following the LSTM layer, a Fully Connected Layer is incorporated to map the LSTM outputs to a single output value. This layer facilitates the transformation of LSTM features into the desired output format for regression tasks. The regression layer serves as the output layer of the LSTM network, responsible for producing continuous-valued predictions. It computes the loss between the predicted values and the ground truth values during training, facilitating the optimization of the network parameters.

The LSTM network is trained using the Adam optimizer, a variant of stochastic gradient descent (SGD) known for its efficiency and effectiveness in training deep neural networks. The training process is iterated over a maximum of 250 epochs, controlling the duration of the training procedure. The initial learning rate is set to 0.005. After every 125 epochs, the learning rate is decayed by a factor of 0.2, facilitating smoother convergence and preventing overfitting.

The predictions generated by the LSTM network for each denoised series are then aggregated to produce an overall forecast of the actual data. This aggregation process consolidates the individual predictions from each ICEEMDAN-SVD component, resulting in a comprehensive prediction of the future values for the entire time series. In Fig. 4, the ICEEMDAN-SVD modes and their corresponding LSTM predictions are visually depicted, offering a clear illustration of the model’s forecasting performance. This visual representation enables a qualitative assessment of how well the LSTM captures the intricate temporal dynamics present in each denoised series, showcasing its effectiveness in predicting future values.

Fig. 4
figure 4

Actual and predicted ICEEMDAN-SVD modes by LSTM

3.5 Comparison

The seven other models this study utilized to measure the efficacy of the recommended model are the LSTM model, EMD - LSTM model, EEMD - LSTM model, CEEMDAN - LSTM model, EEMD - SVD - LSTM model, ICEEMDAN - LSTM model and CEEMDAN - SVD - LSTM model. The LSTM model forecasted the data by applying LSTM directly. In the EMD - LSTM model, the application of EMD produced five IMFs and a residue which are predicted then by LSTM and the predictions are added to find the forecast of the actual series. The fragmentation of the data into eight IMF components and a residue is carried out first by EEMD in the EEMD - LSTM model. Then the forecast of actual data is obtained by summing the predictions of components and residue by LSTM. In the CEEMDAN - LSTM model, each of the eight IMF components and residue obtained from the series by CEEMDAN is predicted by LSTM and by adding them, the forecast of the original series is obtained. In the EEMD - SVD - LSTM model, all the eight IMF components and residue obtained by applying EEMD are de-noised by SVD before they are forecasted by LSTM. After the forecasting of all the above components, they are added to get the forecast of original series. In the ICEEMDAN - LSTM model, LSTM is applied to each of the seven IMF components and residue which are obtained as a result of ICEEMDAN and as before added their forecast series to obtain the forecast of the actual series. In CEEMDAN - SVD - LSTM model, CEEMDAN is performed first for the data and decomposed into eight IMF components and residue. Then SVD de-noised each components and residue. Then LSTM is applied to forecast each of the CEEMDAN - SVD modes and adding all the forecast series we obtained the forecast of the original series. The modes produced by EMD, EEMD, CEEMDAN, EEMD - SVD, ICEEMDAN and CEEMDAN - SVD and their forecasts by LSTM are depicted in the Figs. 5, 6, 7, 8, 9 and 10.

Fig. 5
figure 5

Actual and predicted EMD modes by LSTM

Fig. 6
figure 6

Actual and predicted EEMD modes by LSTM

Fig. 7
figure 7

Actual and predicted CEEMDAN modes by LSTM

Fig. 8
figure 8

Actual and predicted EEMD - SVD modes by LSTM

Fig. 9
figure 9

Actual and predicted ICEEMDAN modes by LSTM

Fig. 10
figure 10

Actual and predicted CEEMDAN - SVD modes by LSTM

3.6 Performance Measures

To validate the efficiency of forecasting models many performance measures are used in the studies. Here we utilized Root Mean Squared Error (RMSE) and Mean Absolute Error (MAE) for the same. They are respectively defined as:

$$\begin{aligned}{} & {} RMSE~=~\sqrt{\frac{1}{m}\sum _{i=1}^{m}(y_i-\widehat{y_i})^2}\\{} & {} MAE~=~\frac{1}{m}\sum _{i=1}^{m}|y_i-\widehat{y_i}| \end{aligned}$$

4 Results and Discussion

In this study, a novel approach for predicting non-linear time series is introduced through the development of a hybrid ICEEMDAN - SVD - LSTM model. To evaluate its efficacy, the model is applied to analyze Wolf’s Sunspot Numbers, a canonical dataset renowned for its complexity and non-linear characteristics. There are 500 realizations used in the ICEEMDAN to convert the data into seven IMF components by setting the maximum number of sifting iterations to 10 and the Signal-to-Noise Ratio increases for every stage. In the application of SVD to the Hankel representation of each IMF component for getting noise subspace, the singular values not greater than 2.19861 are considered zero in the case of the Hankel matrix corresponding to IMF 1. The corresponding values in the case of the Hankel matrices corresponding to IMF2, IMF3,..., IMF7, and residue are respectively 27.7471, 9.77947, 3.90104, 4.04227, 6.2429, 1.38051 and 56.4937. To find these values, we plot the singular values of each Hankel matrix against their index and obtain a threshold at which the slope deviates drastically. The Hankel representation corresponding to non-zero singular values in each case is used as the denoised part of the corresponding series. Then these denoised parts are the forecasted by LSTM network as mentioned in the previous section. Aggregating all these forecasted series we obtained the forecast of the actual series.

In addition to our proposed model, we benchmarked its performance against seven established techniques: LSTM, EMD - LSTM, EEMD - LSTM, CEEMDAN - LSTM, EEMD - SVD - LSTM, ICEEMDAN - LSTM, and CEEMDAN - SVD - LSTM. The results of this comparative analysis are summarized in Table 1, showcasing the efficiency and effectiveness of our hybrid model. Furthermore, Fig. 11 presents a visual depiction of the observed and predicted Sunspot data, along with the corresponding error analyses conducted using each of the evaluated models. This comprehensive assessment provides valuable insights into the predictive capabilities of our proposed approach and its superiority in accurately capturing the dynamics of complex non-linear time series data.

Table 1 Comparison of the proposed model with other models
Fig. 11
figure 11

Observed versus Predicted Sunspot data and the Errors by LSTM, EMD - LSTM, EEMD - LSTM, CEEMDAN - LSTM, EEMD - SVD - LSTM, ICEEMDAN - LSTM, CEEMDAN - SVD - LSTM and ICEEMDAN - SVD - LSTM models

5 Conclusion

Time series forecasting holds a pivotal role across diverse domains in the modern world, influencing decision-making processes and strategic planning in various sectors. The forecasting of non-linear time series is particularly crucial in the contemporary era, offering the potential to provide nuanced insights that can profoundly impact the future landscape of multiple industries. In response to the intricate challenges posed by non-linear time series forecasting, this study introduces a novel hybrid model: ICEEMDAN - SVD - LSTM. Comprising ICEEMDAN, SVD, and LSTM Network, this model exhibits a synergistic blend of established techniques to address the complexities inherent in non-linear time series data. The proposed hybrid model undergoes a rigorous comparative analysis against several existing models, including LSTM, EMD - LSTM, EEMD - LSTM, CEEMDAN - LSTM, EEMD - SVD - LSTM, ICEEMDAN - LSTM and CEEMDAN - SVD - LSTM. Through comprehensive experimentation and evaluation, our findings robustly affirm the superior efficiency of the ICEEMDAN - SVD - LSTM model. It consistently outperforms traditional models, showcasing its capacity to extract meaningful patterns, reduce noise, and accurately forecast future values in non-linear time series data.

This venture not only contributes a state-of-the-art forecasting model but also emphasizes the importance of adopting hybrid approaches that leverage the strengths of multiple techniques. As we navigate an era increasingly characterized by intricate data dynamics, the proposed model stands as a promising tool for advancing the precision and reliability of time series forecasting, thereby empowering decision-makers across diverse industries to make informed and strategic choices for the future.

The hybrid approach we have constructed is making use of ICEEMDAN, which is based on Empirical Mode Decomposition (EMD). EMD’s primary focus is on identifying the underlying oscillatory patterns within a time series, not on detecting structural breaks or abrupt shifts. Structural breaks often involve sudden changes in the statistical properties of the data, such as shifts in the mean, variance, or other characteristics. EMD may not be the most suitable technique for directly detecting structural breaks because it does not explicitly address changes in these statistical properties. So our proposed hybrid approach may not be suitable for the analysis of nonlinear time series with structural breaks. Incorporating in the proposed model the statistical methods such as the Chow Test, CUSUM Test, Bai-Perron Test, etc. for detecting the structural breaks will be a possible future work.