A Stock Price Forecasting Model Integrating Complementary Ensemble Empirical Mode Decomposition and Independent Component Analysis

Chen, Youwei; Zhao, Pengwei; Zhang, Zhen; Bai, Juncheng; Guo, Yuqi

doi:10.1007/s44196-022-00140-2

A Stock Price Forecasting Model Integrating Complementary Ensemble Empirical Mode Decomposition and Independent Component Analysis

Research Article
Open access
Published: 12 September 2022

Volume 15, article number 75, (2022)
Cite this article

Download PDF

You have full access to this open access article

International Journal of Computational Intelligence Systems Aims and scope Submit manuscript

A Stock Price Forecasting Model Integrating Complementary Ensemble Empirical Mode Decomposition and Independent Component Analysis

Download PDF

Youwei Chen ORCID: orcid.org/0000-0002-0976-3659^1,2,
Pengwei Zhao¹,
Zhen Zhang³,
Juncheng Bai¹ &
…
Yuqi Guo¹

2632 Accesses
5 Citations
Explore all metrics

Abstract

In recent years, due to the non-stationary behavior of data samples, modeling and forecasting the stock price has been challenging for the business community and researchers. In order to address these mentioned issues, enhanced machine learning algorithms can be employed to establish stock forecasting algorithms. Accordingly, introducing the idea of “decomposition and ensemble” and the theory of “granular computing”, a hybrid model in this paper is established by incorporating the complementary ensemble empirical mode decomposition (CEEMD), sample entropy (SE), independent component analysis (ICA), particle swarm optimization (PSO), and long short-term memory (LSTM). First, aiming at reducing the complexity of the original data of stock price, the CEEMD approach decomposes the data into different intrinsic mode functions (IMFs). To alleviate the cumulative error of IMFs, SE is performed to restructure the IMFs. Second, the ICA technique separates IMFs, describing the internal foundation structure. Finally, the LSTM model is adopted for forecasting the stock price results, in which the LSTM hyperparameters are optimized by synchronously utilizing the PSO algorithm. The experimental results on four stock prices from China stock market reveal the accuracy and robustness of the established model from the aspect of statistical efficiency measures. In theory, a useful attempt is made by integrating the idea of “granular computing” with “decomposition and ensemble” to construct the forecasting model of non-stationary data. In practice, the research results will provide scientific reference for the business community and researchers.

The two-stage machine learning ensemble models for stock price prediction by combining mode decomposition, extreme learning machine and improved harmony search algorithm

Article 22 June 2020

A two-stage model for stock price prediction based on variational mode decomposition and ensemble machine learning method

Article 11 June 2023

Stock Price Forecasting with Empirical Mode Decomposition Based Ensemble $$\nu $$ -Support Vector Regression Model

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction

With the continuous development of economic and financial markets, there are more and more factors affecting stock market transactions. In addition to the basic market factors, national tax politics, macro-economy, financial conditions and investors’ irrational psychology and behavior factors have different and interrelated effects on stock prices. Financiers, researchers, and the government have considered stock index price forecasting, which is highly challenging because the stock market is mainly a dynamic, unstable, chaotic system [1, 2]. Therefore, to accurately forecast the stock price, this paper aims to enhance the model’s predictive impact.

All kinds of stock price prediction models have appeared in recent decades. At present, the existing models of forecasting stock price are generally categorized into statistical approaches, artificial intelligence technologies, and hybrid models [3]. The popular statistical methods, such as autoregressive moving average (ARMA) [4], autoregressive integrated moving average (ARIMA)[5], and generalized autoregressive conditional heteroskedasticity (GARCH) [6], provide a better performance in a relatively stable stock market [5, 7]. For instance, Kristjanpoller and Michell [8] employed different GARCH models to predict the stock market fluctuations. The Markov Switching method was then conducted to specify the states of external effects. Finally, the experimental results indicated that the GARCH model with external factors could improve the forecasting accuracy. Pai and Lin [7] utilized the ARIMA model to capture the linear patterns. The data results revealed that the ARIMA model could utilize the unique strength of stock price trend forecasting.

Due to the nonlinear behavior and fast variations of stock prices, the statistical models’ ability to forecast the stock price is limited [9,10,11]. With the rapid growth of artificial intelligence, several efficient artificial intelligence approaches with strong robustness and fault tolerance have been presented for predicting the stock price in the literature, such as artificial neural networks (ANNs) (Elman neural network (ENN) [3], generalized regression neural network(GRNN) [12], radial basis function network (RBFN) [13] and wavelet neural network (WNN) [3], fuzzy logic methods [14], support vector machine (SVM) [15, 16], least-squares support vector machine (LSSVM) [17], and extreme learning machine (ELM) [18]. For example, Wang et al. [3] utilized the ENN, which can process dynamic data, to forecast four types of global stock indices. The experimental results indicated that ENN could improve prediction accuracy with fewer hidden neurons. Paiva et al. [15] established a unique decision-making model using the SVM and mean-variance approach for portfolio selection. Firstly, SVM was employed to forecast the stock exchange index. Next, the mean-variance method was conducted to allocate the investment funds. This fusion approach extended the theoretical usage of machine learning to propose a potentially practical method for stock price prediction.

Although various stock price predicting models have been established in the literature, the mentioned models are not generic. Notably, the statistical methods suffer from poor extrapolation ability, narrow prediction scale, and significant dependency on data. They can only be employed for linear data and are inappropriate for fluctuating data and noise [19, 20]. Although the grey model (GM) is appropriate for data with an exponential-type trend, it cannot describe the undulated series trend [21, 22]. The ANNs’ initial random weights and thresholds can influence the forecasting accuracy, increasing the forecasting model’s instability [19]. Scholars integrated traditional models to develop a hybrid forecasting model regarding the mentioned drawbacks, attaining the desired effect and becoming a typical mainstream [23]. For instance, Zolfaghari and Gholami [5] established a hybrid model by combining the financial factors and ARIMAX-GARCH family models for forecasting the stock price index. Compared with the benchmark models, the results reflected an enhancement in the hybrid model’s stock index prediction. Kristjanpoller et al. [6] combined ANFIS and GARCH techniques to specify individual effects on any stock index and used an ANN framework to enhance the stock price’s prediction performance. The results indicated that the mentioned approach could better estimate the stock index fluctuations.

The artificial intelligence method is prevalent in the stock field, but it is unstable, and parameter initialization can affect its results [24,25,26]. Furthermore, to solve the instability problem of the ANNs, various optimization algorithms, such as particle swarm optimization (PSO) [3, 27], firefly algorithm [28], whale optimization algorithm (WOA) [29, 30], and an enhanced version of the cuckoo search algorithm (CSA) [31], have been utilized to optimize the ANNs’ initial weight and threshold. Chandar [32] employed the NN optimized by genetic algorithms and PSO for forecasting the intraday stock price. The results demonstrated that the PSO-BPNN model could maximize forecasting precision while estimating the intraday stock price. Similarly, Das et al. [33] modified the ELM by the crow search algorithm to obtain the optimum weights to forecast the short-term stocks’ closing price. High precision was obtained while validating the established model on several actual markets.

While obtaining accurate prediction results, it is also a challenge to find the stock data’s trend characteristics and original signals. The hybrid prediction models are established to solve the mentioned problems. In order to distinguish and derive the primary characteristics of the stock price data, the hybrid model has been combined with various decomposition methods like the wavelet transform (WT) [5], wavelet packet transform (WPT) [34], singular spectral analysis (SSA) [35], variational mode decomposition (VMD) [36], empirical mode decomposition (EMD) [37], and complementary ensemble empirical mode decomposition (CEEMD) [3]. However, these decomposition methods produce too many IMFs, increasing the time cost and accumulating more errors after multiple stacking. Fusing IMFs with similar frequency to reduce the time cost and cumulative error is a valuable issue that can improve the decomposition methods.

Based on the mentioned analysis above, there are two disadvantages in the existing stock price prediction methods. On the one hand, the internal signal of the original data can not be revealed. On the other hand, the IMFs obtained by decomposition method have cumulative error. Therefore, aiming at solving these problems, this paper proposes a new prediction framework of stock price based on the theory of “granular computing” and the idea of “decomposition and integration”. First, the collected unstable stock price data are processed by decomposition method to obtain the subsequences with different characteristics. Then the complexity measurement method is used to calculate the entropy of the subsequences. Meanwhile, for the sake of alleviating the cumulative error, the sequences with similar entropy are merged. Next, the essential characteristics of stock price data is revealed. Finally, the neural network improved by the optimization algorithm is utilized to forecast the merged subsequences, respectively. Finally, the forecasting results are integrated. Specifically, the stock price forecasting model based on ICA and CEEMD is established to reveal the primitive characteristics of stock price data and construct a more efficient hybrid model than the traditional ones. The established model comprises four units: a data preprocessing unit composed of CEEMD and SE (CS), a reconstitution and analysis unit, an optimization unit, and a prediction unit. The data preprocessing unit and reconstitution and analysis unit present a decomposition technique to overcome the deficiencies of traditional decomposition techniques for attaining further enhancements. Meanwhile, to reveal the stock price data’s primary features, the ICA is performed to separate the independent component from the IMFs. The Long short-term memory (LSTM) optimized by the PSO algorithm is then employed for forecasting the IMFs. In the end, final prediction values are achieved by integrating the forecasting results of all IMFs.

The current study is arranged as the following. Section 2 expresses the CEEMD, SE, ICA, and LSTM approaches. The established stock price prediction model is given in Sect. 3. Section 4 provides the experimental results. The conclusion and future research are presented in Sect. 5.

2 Preliminaries

The current section presents the comparative methods of the established model, including the CEEMD and ICA.

2.1 CEEMD

EMD decomposition has been extensively utilized to transform the complex initial series signal with nonlinear and nonstationary characteristics into a class of fluctuating modes described with IMFs, assuming that each signal can be a mix of IMFs series. There exist two conditions for any IMF: the extrema and zero-crossings should have similar values throughout the dataset, and the summation of the peak and valley values should be zero. The second condition indicates that the positions of upper and lower envelopes are symmetrically distributed along with a zero axis. Nevertheless, modal aliasing may be caused by the EMD approach, resolved through noise-based analysis. In terms of the average frequency distribution feature of Gaussian white noise added to the series, the signal has continuity on various scales, avoiding modal aliasing to a certain extent [38]. In order to prevent the mode mixing in EMD, the EEMD is further enhanced using a random Gaussian white noise incorporated with the original series data [39]. Yeh et al. [40] extended EMD to EEMD and CEEMD. Time series can be reestablished to eliminate the residual auxiliary noise and attain a more complete decomposed result through the CEEMD. For a specific original time series, the CEEMD algorithm is described as follows:

(1)
N pairs of Gaussian white noise containing positive and negative signs are added to the original signal, and the resultant 2N signal set can be described as:
$$\begin{aligned} \left[ \begin{array}{l} M_{1} \\ M_{2} \end{array}\right] =\left[ \begin{array}{cc} 1 &{} 1 \\ 1 &{} -1 \end{array}\right] \left[ \begin{array}{l} X \\ n \end{array}\right] \end{aligned}$$
(1)
where X and n stand for the original signal, and the Gaussian white noise, respectively. $M_1$ and $M_2$ stand for the signals obtained after adding or subtracting the original signal to or from the noise, respectively.
(2)
The EMD is applied to the target signals to achieve a class of IMF components for any time signal. $IMF_{ij}$ stands for the ith component of the jth IMF.
(3)
The results of any IMF are generated after taking an average of the entire ensembles, as described in the following:
$$\begin{aligned} I M F_{j}=\frac{1}{2 N} \sum _{i=1}^{2 N} I M F_{i j} \end{aligned}$$
(2)

Now, the final signal Y(t) can be generated after applying CEEMD as:

$$\begin{aligned} Y(t)=\sum _{j=1}^{k} {IMF}_{j}(t)+r e s \end{aligned}$$

(3)

where res is the residual term.

2.2 ICA

As a signal separating technique, the ICA approach has been presented by Comon [41]. The ICA aims to extract source signals $s_i,(i=1,2,\ldots ,n)$ from the input signal Y(t). If Y(t) is a linear combination of $s_i,(i=1,2,\ldots ,n)$, it can be represented with the following matrix form

$$\begin{aligned} Y=AS=\sum \limits _{i=1}^n a_is_i^T \end{aligned}$$

(4)

where $Y=[y_1,y_2,\ldots ,y_n]^T$ stands for the input signal vector, $A=[a_1,a_2,\ldots ,a_n]$ is the original signal vector, which is a mixing matrix, and $A\in R^{n\times n}$, where $a_i$ is the ith column of A. Generally, we do not know A. $S=[s_1,s_2,\ldots ,s_n]$ is the source signal vector, while $s_i$ and $s_j$ are mutually independent. The ICA aims to obtain the separation matrix M, where $M\in R^{n\times n}$ such that

$$\begin{aligned} X=MY=MAS \end{aligned}$$

(5)

where X stands for the estimated value of S. Since ICA includes the uncertainties of variances and order, $y_i$ is not necessarily an estimation of $s_i$. The following three assumptions can be considered: the first one is that the mixing matrix A is full rank, the second is that source signals are mutually independent, and the third is that there is at most an individual source signal with a Gaussian distribution.

3 CEEMD and ICA-Based Stock Price Forecasting Model

The current section adopts a decomposition-hybridization algorithm based on the CEEMD, SE, ICA, PSO, and LSTM methods to forecast the stock price, abbreviated to CS-ICA-PSO-LSTM. The following subsection presents various performance measures to attain the forecasting accuracy of the established model.

It is challenging for traditional approaches to efficiently obtain the measured data’s dynamic features due to the stock price’s nonlinear and non-stationary features. Zhu [42] indicated that nonlinear and non-stationary problems could be solved using the data envelopment analysis approach. Accordingly, inspired by the research work of [42], we constructed CS-ICA-PSO-LSTM forecasting model of stock price. We call four-tuple (U, E, I, F) a hybrid forecasting system, where $U=\{x_1, x_2, \ldots , x_n\}$ is a finite nonempty set, E is the value of sample entropy, $I=\{IMF_1, IMF_2, \ldots , IMF_n, Res \}$ is the decomposition results, $F=\{f_1, f_2, \ldots , f_m \}$ refers to the clustering results. The established model can be categorized into the following steps.

3.1 Decomposition and Reconstruction of Time Series of Stock Price Based on CEEMD and SE

Influenced by various information, the time series of stock price has strong nonstationary and irregularity. To solve this problem, the theory of “granular computing” and “decomposition and ensemble” is introduced in this paper. The mode decomposition approach is adopted for decomposing the stock price into IMFs. As a self-adaption decomposition approach, EMD [37] can decompose the measured data into various IMFs. Boundary impact and mode mixing are the main shortcomings of the decomposed IMFs. In order to resolve the mentioned difficulties, the stock price is decomposed into various IMFs using the CEEMD [40]. Based on CEEMD, the raw time series of stock price can be decomposed into different components. Because CEEMD decomposition method transforms complex time series into relatively stable and regular IMFs, the hidden information is easy to mine. The detailed process and definition are presented as following:

Definition 3.1

Let (U, E, I, F) be a hybrid forecasting system. Assume $\varepsilon$ denote the decomposition error. The relation of U and I is defined as following:

$$\begin{aligned} x_i-IMF_{ij}\le \varepsilon \quad or \quad X-IMF_{j=1}^{m}\le \varepsilon (i=1, 2, \ldots , n) \end{aligned}$$

(6)

However, too many IMFs lead to high time costs and cumulative errors. The current paper employs the SE to measure the stock price complexity. Then, the IMFs with the same complexity are fused. Sample entropy (SE) presented by Richman et al. [43] is utilized to verify the data series complexity. Compared with approximate entropy, the advantage of SE is that it much less dependent on the data length and has a good immunity to the noise interference. The SE has two benefits: On the one hand, the length of the input data does not affect its calculation results. On the other hand, the SE is not influenced by the loss of the input data. Even though two-thirds of the input data remain, it cannot significantly influence the computations. It can fully describe the signal entropy and includes three crucial parameters: the similarity tolerance r, adjusted to 0.2STD, where STD describes the series standard deviation, the embedding dimension m, chosen as 2, and the size of the time series [44]. In order to implement the SE value, the following steps should be performed:

(1)
$\mathbf {Y}=(y(1),y(2),\ldots ,y(n))$ is reconstructed into the following matrix form:
$$\begin{aligned} \tilde{\mathbf {Y}}=\left[ \begin{array}{c} y(1), y(2), \cdots , y(n-m+1) \\ y(2), y(3), \cdots , y(n-m+2) \\ \vdots \\ y(m), y(m+1), \cdots , y(n) \end{array}\right] \end{aligned}$$
(7)
(2)
The distance between vectors $\mathbf {y}(i)$ and $\mathbf {y}(j)$, denoted by $d[\mathbf {y}(i),\mathbf {y}(j)]$, is described as.
$$\begin{aligned}&d[\mathbf {y}(i), \mathbf {y}(j)]=\max ( \mid \mathbf {y}(i+k)-\mathbf {y}(j+k) \mid )\nonumber \\&\quad (1 \le k \le m-1 ; 1 \le i \ne j \le n-m+1) \end{aligned}$$
(8)
where k stands for the step length.
(3)
Considering a threshold r, if $B_i$ is a number for a specific $\mathbf {y}(i)$ satisfying the condition $d[\mathbf {y}(i),\mathbf {y}(j)]\le r$, the ratio $B_i^{m}(r)$ can be obtained as:
$$\begin{aligned} B_{i}^{m}(r)=\frac{1}{n-m+1} B_{i} \end{aligned}$$
(9)
(4)
Calculate the mean $B^m(r)$ of $B_i^m(r)$ as:
$$\begin{aligned} B^{m}(r)=\frac{1}{n-m} \sum _{i=1}^{n-m} B_{i}^{m}(r) \end{aligned}$$
(10)
(5)
Update m and iterate steps (1) to (3) to derive $B^{m+1}(r)$ as:
$$\begin{aligned} B^{m+1}(r)=\frac{1}{n-m} \sum _{i=1}^{n-m} B_{i}^{m+1}(r) \end{aligned}$$
(11)
(6)
Obtain the SE value as:
$$\begin{aligned} E(m, r)=\lim _{n \rightarrow \infty }\left\{ -\ln \left[ \frac{B^{m}(r)}{B^{m+1}(r)}\right] \right\} \end{aligned}$$
(12)

The SE values determine the series’ nonstationary degree; that is, the series complexity increases with the increase of the SE value.

3.2 Independent Component Extraction and Prediction

The revelation of source signals is the fundamental stage for establishing a stock price forecasting model. As a commonly used approach to determine the measured data’s underlying parameters, ICA has been introduced into the revelation of source signals [41]. It has excellent efficiency in grasping the nonlinear and non-stationary signals’ internal properties. As a signal separating technique, the ICA approach has been proposed by Comon [41]. As a powerful data analysis tool in recent years, the hidden factors or components from multivariate (multidimensional) statistical data can be found by the ICA, which is considered as an extension of principal component analysis and factor analysis. For the problem of blind source separation, ICA refers to an analysis process of separating or approximately separating the source signal without knowing the source signal, noise and mixing mechanism.

Definition 3.2

Given a hybrid forecasting system (U, E, I, F), let $S=[s_1,s_2,\ldots ,s_n]$ be the source signal vector, M denote the separation matrix, A represent vector, and Y refer to the separating signal. For any $X \in U$, the following relation holds:

$$\begin{aligned} X=MY=MAS \end{aligned}$$

(13)

Finally, the LSTM is modeled on the residual ICs to establish a stock price prediction model. Notably, the LSTM’s prediction accuracy depends on determining the hyperparameters. In order to attain accurate forecasting results, the PSO [45] algorithm is adopted to find the optimum values of the LSTM hyperparameters. The PSO, first presented by Eberhart et al. [46] in 1995, is one of swarm intelligent optimization algorithms. This algorithm originated from the study of bird predation. The simplest and most effective way of finding edibles is to search for the nearest surrounding area where the birds have found food. Inspired by birds (particles) flocks looking for food, PSO employs the shared data between the particles to determine the global optimal solution. After initializing the particles, an iterative procedure is adopted to find the global optimal solution. In any iteration, the ith particle employs the particle tracking format at the optimal position of individual $p_{best}$ and swarms $g_{best}$ to update its current position pi and velocity $v_i$, respectively. This means that the $v_i$ and $p_i$ are updated. See [47, 48] to better understand the PSO algorithm and its applications.

Consider a D-dimensional search space. The ith particle denotes a D-dimensional vector $X_i=[x_{i1},x_{i2},\ldots ,x_{iD}]$. It also represents the location of the ith particle in the D-dimensional space or a potential solution to the problem. In every iteration k, the particle changes its speed $V_i$ and location according to the personal and global extremum with the following updating rule:

$$\begin{aligned}&V_{i}(k+1)=\omega V_{i}(k)+c_{1} r_{1}\left( P_{i}^{ {best }}(k)-Y_{i}(k)\right) \nonumber \\&\quad\quad\quad\quad +c_{2} r_{2}\left( P_{g}^{ {best }}(k)-Y_{i}(k)\right) , \end{aligned}$$

(14)

$$\begin{aligned}&Y_{i}(k+1)=Y_{i}(k)+V_{i}(k), \end{aligned}$$

(15)

where $\omega$ refers to the weight, $c_1$ and $c_2$ describe the acceleration factors, $r_1$ and $r_2$ stand for the random number distributed in the range [0, 1].

LSTM, as a refined version of the RNN structure, was employed to resolve the gradient vanishing problems of RNN [49]. The LSTM employs a gating approach to control adding or removing the cells’ state information. The cell state is equivalent to an information transmission path and is implemented immediately on the whole chain. The gate framework selectively allows the information to pass. Figure 1 describes the structure of an individual LSTM module.

In the LSTM structure, $C_{t-1}$ and $h_{t-1}$ represent the cell state and output of the upper layer. $y_t$ refers to the new input. $\tanh$ is the generated new cell value. Three $\sigma$ symbols are the input, output, and forgetting gates, respectively. These three gates cooperate to control the whole cell state. The renewal process of each cell is as follows:

(1)
Let $W_f$ and $b_f$ represent the weights and bias. The following formula determines all information to be abandoned in the forgetting gate:
$$\begin{aligned} f_t=\sigma (W_f[h_{t-1},y_t]+b_f) \end{aligned}$$
(16)
(2)
The following formula calculates all information to be updated in the input gate:
$$\begin{aligned} \begin{array}{l} i_{t}=\sigma \left( W_{i}\left[ h_{t-1}, y_{t}\right] , b_{i}\right) \\ h_{t}=\tanh \left( W_{C}\left[ C_{t-1}, y_{t}\right] +b_{C}\right) \\ C_{t}=f_{t} * C_{t-1}+i_{t}^{*} C_{t} \end{array} \end{aligned}$$
(17)
(3)
The following formula updates the cell’s output gate:
$$\begin{aligned} \begin{array}{l} S_{t}=\sigma \left( W_{S}\left[ h_{t-1}, y_{t}\right] + b_{S}\right) \\ h_{t}=S_t * \tanh (C_t) \end{array} \end{aligned}$$
(18)

Based on the above discussion, the current study starts with the mode decomposition method improved by the sample entropy, followed to reveal the source signals of independent components (ICs). In the end, the neural network optimized by heuristic algorithm is adopted to produce the prediction results. The detailed flow chart is illustrated in Fig. 2, which depicts the framework of the established model. Considering various goals, three components can be categorized. From the decomposition unit aspect, the original information X can be decomposed into a range of IMFs through the CEEMD model (the subscript n stands for the number of IMFs rather than the sifting times in the CEEMD algorithm, while the residual function Res is considered a new IMF for convenience). Taking into consideration that too many IMFs will accumulate errors, the SE is used to reconstruct IMFs. Next, the ICA is conducted to reveal the independent components of IMFs. Finally, the LSTM optimized by PSO is utilized to forecast the all independent components. These parameters of all algorithms are listed in Table 1.

Table 1 The parameter setting of these methods

Full size table

3.3 Evaluation Criteria

In order to assess the accuracy of prediction results, five measures are chosen as evaluation criteria: mean absolute error (MAE), root mean square error (RMSE), mean absolute percentage error (MAPE), directional accuracy (DA) prediction statistic, and coefficient of determination $(R^2)$ [50,51,52]. These criteria are defined as follows:

$$\begin{aligned}&M A P E=\frac{1}{n} \sum _{i=1}^{n} \mid \frac{Y_{i}-\hat{Y}_{i}}{Y_{i}} \mid \times 100 \% \end{aligned}$$

(19)

$$\begin{aligned}&M A E=\frac{1}{n} \sum _{i=1}^{n} \mid Y_{i}-\hat{Y}_{i} \mid \end{aligned}$$

(20)

$$\begin{aligned}&R M S E=\sqrt{\frac{1}{n} \sum _{i=1}^{n}\left( Y_{i}-\hat{Y}_i\right) ^{2}} \end{aligned}$$

(21)

$$\begin{aligned}&R^2=1-\frac{\sum _{i=1}^{n}\left( \hat{Y}_i-Y_{i}\right) ^{2}}{\sum _{i=1}^{n}\left( \bar{Y}_i-Y_{i}\right) ^{2}} \end{aligned}$$

(22)

The former four measures can be utilized to verify the forecasting accuracy, and the DA describes another essential measure to predict the movement orientation [53, 54]. MAPE is the ratio of the absolute value of the error to the actual value. The value of MAPE is $[0, \infty ]$. 0 denotes a perfect prediction model, while greater than 100% indicates a poor prediction result. MAE and RMSE are performed to assess the error between the predicted value and the real value. Its range is $[0, + \infty )$. The greater the error, the greater the value of MAE. When the predicted value is completely consistent with the real value, it is equal to 0. In other words, the forecasting model is perfect. The numerator of $R^2$ represents the sum of the square difference between the real value and the predicted value, similar to the RMSE. Its denominator represents the sum of the square difference between the real value and the mean.

$$\begin{aligned} DA=\frac{1}{n}-\sum _{i=1}^{n} d_i \end{aligned}$$

(23)

where $d_{i}=\left\{ \begin{array}{ll} 1, &{} \left( Y_{i}-Y_{i-1}\right) \left( \hat{Y}_{i}-Y_{i-1}\right) \ge 0 \\ 0, &{} { otherwise } \end{array}\right.$; n stands for the whole number of data points; $Y_i$ and $\hat{Y}_i$ describe the actual and predicted values of the ith term, respectively.

MAPE, MAE, and RMSE are employed to determine the difference between the predicted and actual values. The lower values of the mentioned measures indicate a superior prediction accuracy. Nevertheless, better prediction accuracy can be obtained for higher values of the DA and $R^2$. In addition, Shannon entropy is utilized to measure the complexity of neural networks, which is expressed as the following:

$$\begin{aligned} S_e=-k \sum _{i=1}^{N} p_{i}=\frac{ \text{ D } (i)}{\sum _{i=1}^{N} {\text {D}}(i)} \log p_{i}=\frac{ \text{ D }(i)}{\sum _{i=1}^{N} {\text {D}}(i)}. \end{aligned}$$

(24)

where N denotes the number of nodes, and $\text{ D }(i)$ represents the degree of ith node, $k=1.38\times 10^{-23}$ denotes the Boltzmann constant.

4 Experiments and Discussions

This section is organized as follows. Data sources are introduced in subsection 4.1. The decomposition, reconstruction, and ICA results are presented in subsections 4.2 and 4.3. Finally, the prediction accuracy of different models is compared in subsection 4.4.

4.1 Data

The current work aims to predict the stock price. Based on the turnover and total market value, the Shanghai Stock Exchange (SSE) 50 index is chosen, with the features of a large scale, relatively stable, and sufficient liquidity [55]. Thus, the current work randomly chooses four stocks in the SSE 50 index as candidate assets. The ticker symbols of four stocks are “SH600518”, “SH600519”, “SH600999”, and “SH601988”. The statistical description about the 4 stocks is presented as following:

Table 2 The statistical properties of stocks

Full size table

Data from March 19, 2001 to March 16, 2021 are collected and divided into a training set and a testing set. The training set ranges from March 19, 2001 to April 17, 2017 and the testing set is from April 18, 2017 to March 16, 2021. The ratio of training set and testing set is 8:2.

4.2 The CEEMD Decomposition and Reconstruction Results

At first, CEEMD is employed for decomposing the stock price’s raw series into several IMFs. The CEEMD decomposition results are presented in Figs. 3, 4, 5, 6 and 7. The SE values for all IMFs obtained by CEEMD decomposition are presented in Table 1. In order to alleviate the calculation cost, improve the modeling speed, and prevent overfitting problems, a combination of IMFs with the same sample entropy values is employed. For “SH600518”, the sample entropy values of the first three IMFs (IMF1, IMF2, and IMF3) are more than the others, indicating a complicated, volatile, but inconspicuous pattern. In contrast, the last three IMFs and the residue (IMF10, IMF11, IMF12, and Res) are much lower with an evident trend with slight complexity and volatility. Thus, the mentioned components can be combined to form the following new Intrinsic Mode Functions (NIMFs): high-frequency sequence NIMF1 (IMF1-3), medium-frequency sequence NIMF2 (IMF4-6), low-frequency sequence NIMF3 (IMF7-9), and trend sequence NIMF4 (IMF10-12, Residual). The variation trend of the mentioned four NIMFs is relatively normal and simple, suitable for further deriving the variation characteristics of any IMF and learning the predictive model. The same reconstruction model is chosen for the other three stocks. The detailed reconstruction results are presented in Fig. 5.

Table 3 The values of sample entropy of IMFs

Full size table

4.3 The ICA Results

The separation unit is another step of the established model. We start with separating ICs. The ICA model is applied to the IMFs to attain ICs. Moreover, the ICs are found by revealing the source signal. Finally, the ICs with source signals are considered as the input of the LSTM for the construction of the forecasting model.

The four ICs are separated through the ICA from the recombination modes. Figs. 8, 9, 10, 11 describe the final results obtained with the enhanced CS-ICA method. Based on Eq. (13), four separation matrixes are derived.

$$\begin{aligned}&M_{S H 600518} =\left( \begin{array}{cccc} 0.2845 &{} 0.2116 &{} 0.4625 &{} -0.8127 \\ -0.1737 &{} 0.5012 &{} 0.7049 &{} 0.4708 \\ 0.1273 &{} 0.8347 &{} -0.5341 &{} -0.0420 \\ 0.9342 &{} -0.0849 &{} 0.0630 &{} 0.3408 \end{array}\right) \end{aligned}$$

(25)

$$\begin{aligned}&M_{S H 600519}=\left( \begin{array}{cccc} 0.5367 &{} 0.3560 &{} -0.4494 &{} 0.6191 \\ -0.6338 &{} -0.0869 &{} 0.1976 &{} 0.7428 \\ 0.5566 &{} -0.4717 &{} 0.6363 &{} 0.2506 \\ 0.0206 &{} 0.8020 &{} 0.5951 &{} -0.0470 \end{array}\right) \end{aligned}$$

(26)

$$\begin{aligned}&M_{S H 600999}=\left( \begin{array}{cccc} -0.3628 &{} -0.0012 &{} -0.2276 &{} 0.9036 \\ 0.9289 &{} 0.0146 &{} -0.0119 &{} 0.3700 \\ -0.0328 &{} 0.9648 &{} 0.2555 &{} 0.0525 \\ -0.0672 &{} -0.2625 &{} 0.9396 &{} 0.2093 \end{array}\right) \end{aligned}$$

(27)

$$\begin{aligned}&M_{S H 601988}=\left( \begin{array}{cccc} 0.9990 &{} -0.0078 &{} -0.0004 &{} 0.0438 \\ -0.0044 &{} 0.6854 &{} 0.6910 &{} 0.2294 \\ 0.0149 &{} 0.7281 &{} -0.6518 &{} -0.2142 \\ 0.0417 &{} 0.0010 &{} 0.3141 &{} -0.9485 \end{array}\right) \end{aligned}$$

(28)

The following formula can evaluate the stock price series. The mixing matrix A is the inverse of M, where $a_i$ stands for the sum of the ith column.

$$\begin{aligned}&\hat{x}(t)_{S H 600518}=\sum _{i=1}^{4} a_{i} s_{i}(t)=0.1459 s_{1}+1.5032 s_{2}\nonumber \\&\quad\quad\quad\quad\quad +0.3859 s_{3}+1.2531 s_{4} \end{aligned}$$

(29)

$$\begin{aligned}&\hat{x}(t)_{S H 600519}=\sum _{i=1}^{4} a_{i} s_{i}(t)=1.0624 s_{1}+0.2197 s_{2}\nonumber \\&\quad\quad\quad\quad\quad +0.9718 s_{3}+1.3707 s_{4} \end{aligned}$$

(30)

$$\begin{aligned}&\hat{x}(t)_{S H 600999}=\sum _{i=1}^{4} a_{i} s_{i}(t)=0.3120 s_{1}+1.3016 s_{2}\nonumber \\&\quad\quad\quad\quad\quad +1.2400 s_{3}+0.8192 s_{4} \end{aligned}$$

(31)

$$\begin{aligned}&\hat{x}(t)_{S H 601988}=\sum _{i=1}^{4} a_{i} s_{i}(t)=1.0346 s_{1}+1.6014 s_{2}\nonumber \\&\quad\quad\quad\quad\quad -0.1222 s_{3}-0.5917 s_{4} \end{aligned}$$

(32)

4.4 Comparing the Prediction Accuracy

In order to verify the accuracy of the established model, the current work employs some benchmark models for comparison, including ARIMA, BP, SVM, LSTM, CNN, PSO-LSTM, CS-PSO-LSTM, and CS-ICA-PSOLSTM (CS refers to the CEEMD optimized by SE). The prediction results of the established model are compared with the other benchmark models in Tables 4 5, 6 and 7. The following results can be obtained:

(1)
The LSTM is superior to ARIMA and BP, as described in Table 2. Compared with ARIMA and BP for SH600518, the considerable decrease in the MAE index in the 1-step forecasting is 89.57% and 49.21%. As shown in Table 3, for MAPE in the SH600519, the decline is 51.25% and 24.98% in the 2-step prediction, respectively. As presented in Table 4, the MAE, RMSE, and MAPE in the LSTM model are significantly less than in the ARIMA model in the 3-step prediction. Compared with ARIMA, the LSTM model decreases 72.84%, 71.35%, and 72.92% in MAE, RMSE, and MAPE for 3-step prediction, respectively. The results reveal that deep learning can efficiently mine the long-term dependence of stock series. From the experiment involving different single models, we can get that LSTM play a positive role in forecasting performance of stock price.
(2)
The PSO-LSTM model nonlinear benchmark is constructed to evaluate the PSO model’s efficiency for stock price forecasting. PSO-LSTM gives higher prediction performance. As shown in Table 4, compared with LSTM at 2-step forecasting for the SH600999, the MAE, RMSE, and MAPE values of PSO-LSTM for SH600999 are 45.07%, 35.91, and 43.15%, respectively. Hence, it has been verified that the PSO model is more suitable for stock forecasting. Compared with the ARIMA model, it is evident that the LSTM model provides better efficiency in 1-step prediction. For instance, Table 5 shows that the MAE, RMSE, MAPE, and DA values of the PSO-LSTM are decreased by 18.33%, 70.91%, 74.14%, and 0.09% than ARIMA in 1-step prediction, respectively. According to these mentioned analysis, the PSO approach providing optimal parameters for LSTM achieves a better forecasting performance than the LSTM model.
(3)
Compared with PSO-LSTM, CS-PSO-LSTM provides higher prediction accuracy. For example, the MAE, RMSE, and MAPE of CS-PSO-LSTM at 1-step forecasting are increased by 41.67%, 31.25%, and 36.11% for SH601988, respectively. The results indicate that the CEEMD algorithm optimized by SE is an improved decomposition algorithm. Compared with the ARIMA, BP, and LSTM, the increase in MAPE of the CS-PSO-LSTM model for SH601988 in the 3-step forecasting is 80.60%, 78.37%, and 50.18%, respectively, demonstrating the positive influence of the CS on the ENN model. As a result, the CS-PSO-LSTM approach has made great progress compared with the PSO-LSTM model in the field of stock price forecasting.
(4)
The ICA method can improve forecasting accuracy. As presented in Table 3, the MAE, RMSE, and MAPE of CS-ICA-PSO-LSTM in 1-step forecasting for SH600519 are 68.81%, 73.51%, and 54.03%. This performance is better than the other benchmark models. As presented in Table 3, the DA indices of CS-ICA-PSO-LSTM are inferior to ARIMA, BP, LSTM, and PSO-LSTM in terms of the 1-step forecasting performance. According to the above analyses, the performance of the CS-ICA-PSO-LSTM approach is excellent and can be accepted on economic feasibility. It is obvious to see that the forecasting results of the proposed approach with the ICA method can capture the essential characteristics of stock price data and improve the forecasting performance.
(5)
Due to the significant effect of the forecasting horizon on the developed model’s stability, the prediction is divided into 1-step prediction, 2-step prediction, and 3-step prediction in terms of the forecasting horizon. When the forecasting horizon is 1-STEP for SH601988, MAPE values are 0.71%, 0.86%, and 0.99%. The forecasting accuracy degrades with the increase of the forecasting horizon.

Table 4 Forecasting performance of different models for SH600518

Full size table

Table 5 Forecasting performance of different models for SH600519

Full size table

Table 6 Forecasting performance of different models for SH600999

Full size table

Table 7 Forecasting performance of different models for SH601988

Full size table

5 Conclusion and Future Work

As a crucial research area, stock forecasting has attracted great concern owing to its potential financial advantages. Precise forecasting of stock price fluctuations significantly implies the analysis of fluctuation and source signal and establishing a suitable forecasting model. To overcome the weak performance of the existing methods, the current paper presents a hybrid model to forecast stock prices through the CEEMD, SE, ICA, PSO, and LSTM. The current study includes the following tasks. At first, the stock price is adaptively decomposed into several sequences. Further, the SE is employed to reconstruct the decomposed sequence according to complexity. The ICA model is then employed to separate the ICs describing the original data’s internal formation mechanism. Finally, the IC components are chosen as the input data in the LSTM approach, while the PSO algorithm is employed for finetuning the LSTM model’s hyperparameters.

The proposed ES-ICA-PSO-LSTM model comprises various models, including the CEEMD, SE, ICA, PSO, and LSTM, to forecast the stock price. The outstanding performance is attributed to the following reasons: (1) Based on the theory of “granular computing” and “decomposition and ensemble”, the raw data of stock price are decomposed into different components. On the one hand, the hidden information is revealed. On the other hand, different features (trend, period, random, etc.) are classified. (2) SE is conducted to restructure the IMFs and alleviate the cumulative error. (3) The ICA technique can describe the internal foundation structure of IMFs, which is key to reveal the essence of the original signal. In theory, a useful attempt is made by integrating the idea of “granular computing” with “decomposition and ensemble” to construct the forecasting model of non-stationary data. In practice, the research results will provide scientific reference for the business community and researchers.

The stock prices affected by emergencies is difficult to accurately forecast. Future aspects are given as follows: (1) establishing an intelligent forecasting model without considering the number of models employed in the prediction approach, (2) presenting an optimum hybrid model, (3) extending the presented model to another time series like wind speed forecasting and gold price forecasting.

Availability of data and material

Not applicable.

Abbreviations

ANNs:: Artificial neural networks
ARIMA:: Autoregressive integrated moving average
ARMA:: Autoregressive moving average
CEEMD:: Complementary ensemble empirical mode decomposition
CS:: CEEMD and SE
CSA:: Cuckoo search algorithm
DA:: Directional accuracy
ENN:: Elman neural network
ELM:: Extreme learning machine
EMD:: Empirical mode decomposition
GARCH:: Generalized autoregressive conditional heteroskedasticity
GM:: Grey model
GRNN:: Generalized regression neural network
ICA:: Independent component analysis
IMFs:: Intrinsic mode functions
LSTM:: Long short-term memory
LSSVM:: Least-squares support vector machine
MAE:: Mean absolute error
MAPE:: Mean absolute percentage error
PSO:: Particle swarm optimization
RBFN:: Radial basis function network
RMSE:: Root mean square error
SE:: Sample entropy
SSA:: Singular spec tral analysis
SVM:: Support vector machine
VMD:: Variational mode decomposition
WNN:: Wavelet neural network
WOA:: Whale optimization algorithm
WPT:: Wavelet packet transform
WT:: Wavelet transform

References

Ghosh, P., Neufeld, A., Sahoo, J.K.: Forecasting directional movements of stock prices for intraday trading using LSTM and random forests. Finance Res. Lett. 102, 1544–6123 (2021)
Google Scholar
Chen, C., Zhou, Y., Gospodinov, N., Maynard, A., Pesavento, E.: Long-horizon stock valuation and return forecasts based on demographic projections. Int. J. Comput. Int. Sys. 04, 1–27 (2022)
Google Scholar
Wang, Y., Wang, L., Yang, F., Di, W., Chang, Q.: Advantages of direct input-to-output connections in neural networks: The elman network for stock index forecasting. Inform. Sci. 547, 1066–1079 (2021)
Article MathSciNet MATH Google Scholar
Schatz, M., Wheatley, S., Sornette, D.: The ARMA point process and its estimation. Econom. Stat. 36, 2452–3062 (2021)
Google Scholar
Zolfaghari, M., Gholami, S.: A hybrid approach of adaptive wavelet transform, long short-term memory and ARIMA-GARCH family models for the stock index prediction. Expert Syst Appl. 182, 115149 (2021)
Article Google Scholar
Wang, L., Ma, F., Liu, J., Yang, L.: Forecasting stock price volatility: New evidence from the GARCH-MIDAS model. Int. J. Forecast. 36, 684–694 (2020)
Article Google Scholar
Pai, P., Lin, C.: A hybrid arima and support vector machines model in stock price forecasting. Omega 33, 497–505 (2005)
Article Google Scholar
Kristjanpoller, W., Michell, K.: A stock market risk forecasting model through integration of switching regime, anfis and garch techniques. Appl. Soft. Comput. 67, 106–116 (2018)
Article Google Scholar
Harel, A., Harpaz, G.: Forecasting stock prices. Int. Rev. Econ. Finance 73, 249–256 (2021)
Article Google Scholar
Ning, G., Zhou, Y.: Application of improved diferential evolution algorithm in solving equations. Int. J. Comput. Int. Sys. 06, 14–19 (2021)
Google Scholar
Li, R., Han, T., Song, X.: Stock price index forecasting using a multiscale modelling strategy based on frequency components analysis and intelligent optimization. Appl. Soft Comput. 11, 109089 (2022)
Article Google Scholar
Bose, A., Hsu, C.H., Roy, S.S., Lee, K.C., Mohammadi-ivatloo, B., Abimannan, S.: Forecasting stock price by hybrid model of cascading multivariate adaptive regression splines and deep neural network. Comput. Electr. Eng. 95, 107405 (2021)
Article Google Scholar
Guo, X., Wang, H., Yang, F.: Thermal power financial environment risk forecast model by combined stock multi-indicators basis on rbf neural network. AASRI Procedia 1, 519–524 (2012)
Article Google Scholar
Coskun, G.T., Yalcner, A.Y.: Determining the best price with linear performance pricing and checking with fuzzy logic. Comput. Ind. Eng. 154, 107150 (2021)
Article Google Scholar
Paiva, F.D., Cardoso, R.T.N., Hanaoka, G.P., Duarte, W.M.: Decision-making for financial trading: A fusion approach of machine learning and portfolio selection. Expert Syst. Appl. 115, 635–655 (2019)
Article Google Scholar
Han, J., Zhang, T., Li, Y., Liu, Z.: Rd-nmsvm: neural mapping support vector machine based on parameter regularization and knowledge distillation. Int. J. Mach. Learn. Cyber 15, 6–26 (2022)
Google Scholar
Ismail, S., Shabri, A., Samsudin, R.: A hybrid model of self-organizing maps (SOM) and least square support vector machine (lssvm) for time-series forecasting. Expert Syst. Appl. 38, 10574–10578 (2011)
Article Google Scholar
Das, S., Sahu, T.P., Janghel, R.R.: Stock market forecasting using intrinsic time-scale decomposition in fusion with cluster based modified CSA optimized ELM. J. King Saud University Comput. Inform. Sci. 3, 101421 (2021)
Lin, Y., Yan, Y., Xu, J., Liao, Y., Ma, F.: Forecasting stock index price using the CEEMDAN-LSTM model. North Am. J. Econ. Finance 57, 101421 (2021)
Article Google Scholar
Tang, J., Zuo, A., Liu, J., Li, T.: Seasonal decomposition and combination model for short-term forecasting of subway ridership. Int. J. Mach. Learn. Cyber 13, 145–162 (2022)
Article Google Scholar
Ding, S., Li, R.: Forecasting the sales and stock of electric vehicles using a novel self-adaptive optimized grey model. Eng. Appl. Artif. Intell. 100, 104148 (2021)
Article Google Scholar
Kiran, M.S., Siramkaya, E., Esme, E., Senkaya, M.N.: Prediction of the number of students taking make up examinations using artifcial neural networks. Int. J. Mach. Learn. Cyber 13, 71–81 (2021)
Article Google Scholar
Kamara, A.F., Chen, E., Pan, Z.: An ensemble of a boosted hybrid of deep learning models and technical analysis for forecasting stock prices. Inform. Sci. 594, 1–19 (2022)
Article Google Scholar
Han, Y., Pan, P., Lv, H., Dai, G.: A hybrid optimization algorithm for water volume adjustment problem in district heating systems. Int. J. Comput. Int. Sys. 54, 15–39 (2022)
Google Scholar
Deng, S., Xiao, C., Zhu, Y., Tian, Y.: Dynamic forecasting of the shanghai stock exchange index movement using multiple types of investor sentiment. Appl. Soft Comput. 111, 109132 (2022)
Article Google Scholar
A stock time series forecasting approach incorporating candlestick patterns and sequence similarity. Expert Syst. Appl. 31, 117595, (2022)
Li, G., Jing, S., Shen, Y., Guo, B.: Efficient discrete particle swarm optimization algorithm for process mining from event logs. Int. J. Comput. Int. Sys. 07, 15–21 (2022)
Google Scholar
Das, S.R., Mishra, D., Rout, M.: Stock market prediction using firefly algorithm with evolutionary framework optimized feature reduction for OSELM method. Expert Syst. Appl. 4, 100016 (2019)
Google Scholar
Jain, L., Katarya, R., Sachdeva, S.: Opinion leader detection using whale optimization algorithm in online social network. Expert Syst. Appl. 142, 113016 (2020)
Article Google Scholar
Sun, G., Shang, Y., Yuan, K., Gao, H.: An improved whale optimization algorithm based on nonlinear parameters and feedback mechanism. Int. J. Comput. Int. Sys. 68, 15–38 (2022)
Google Scholar
Salgotra, R., Abouhawwash, M., Singh, U., Saha, S., Mittal, N., Mahajan, S., Pandit, A.K.: Multi-population and dynamic-iterative cuckoo search algorithm for linear antenna array synthesis. Appl. Soft Comput. 113, 108004 (2021)
Article Google Scholar
K. C. S, Hybrid models for intraday stock price forecasting based on artificial neural networks and metaheuristic algorithms. Pattern Recogn. Lett. 147 (2021) 124–133
Das, S., Sahu, T. P., Janghel, R. R.: Stock market forecasting using intrinsic time-scale decomposition in fusion with cluster based modified CSA optimized ELM. J. King Saud University Comput. Inform. Sci. 124–133, (2021)
Wang, Y., Li, J., Pei, Y., Ma, Z., Jia, Y., Wei, Y.: An adaptive high-voltage direct current detection algorithm using cognitive wavelet transform. Inform. Process. Manag. 59, 102867 (2022)
Article Google Scholar
Arteche, J., García-Enríquez, J.: Singular spectrum analysis for signal extraction in stochastic volatility models. Econom. Stat. 1, 85–98 (2017)
Google Scholar
Guo, W., Liu, Q., Luo, Z., Tse, Y.: Forecasts for international financial series with vmd algorithms. J. Asian Econ. 80, 101458 (2022)
Article Google Scholar
Xu, M., Shang, P., Lin, A.: Cross-correlation analysis of stock markets using EMD and EEMD. Physica A Stat. Mech. Appl. 442, 82–90 (2016)
Article Google Scholar
Huang, N. E., Shen, Z., Long, S. R., Wu, M. C., Shih, H. H., Zheng, Q., Yen, N.-C., Tung, C. C., Liu, H. H.: The empirical mode decomposition and the Hilbert spectrum for nonlinear and non-stationary time series analysis, Proceedings of the Royal Society of London. Series A: mathematical, physical and engineering sciences 454, 903–995, (1998)
Wu, Z., Huang, N.E.: Ensemble empirical mode decomposition: a noise-assisted data analysis method. Adv Adapt. Data Anal. 1, 1–41 (2009)
Article Google Scholar
Yeh, J.-R., Shieh, J.-S., Huang, N.E.: Complementary ensemble empirical mode decomposition: A novel noise enhanced data analysis method. Adv. Adapt. Data Anal. 2, 135–156 (2010)
Article MathSciNet Google Scholar
Hyvrinen, A., Oja, E.: Independent component analysis: algorithms and applications. Neural Netw. 13, 411–430 (2000)
Article Google Scholar
Zhu, J.: Multi-factor performance measure model with an application to fortune 500 companies. Eur. J. Oper. Res. 123, 105–124 (2000)
Article MATH Google Scholar
Wang, C.: A sample entropy inspired affinity propagation method for bearing fault signal classification. Digit. Signal Process. 102, 102740 (2020)
Article Google Scholar
Zhang, N., Lin, A., Ma, H., Shang, P., Yang, P.: Weighted multivariate composite multiscale sample entropy analysis for the complexity of nonlinear times series. Physica A Stat. Mech. Appl. 508, 595–607 (2018)
Article MATH Google Scholar
Liu, W., Wang, Z., Zeng, N., Yuan, Y., Liu, X.: A novel randomised particle swarm optimizer. Int. J. Mach. Learn. Cyber 73, 529–540 (2021)
Article Google Scholar
Rashno, A., Shafipour, M., Fadaei, S.: Particle ranking: An efficient method for multi-objective particle swarm optimization feature selection. Knowl. Based Syst. 5, 108640 (2022)
Article Google Scholar
Chakravarty, S., Dash, P.: A PSO based integrated functional link net and interval type-2 fuzzy logic system for predicting stock market indices. Appl. Soft Comput. 12, 931–941 (2012)
Article Google Scholar
Liu, W., Wang, Z., Zeng, N., Yuan, Y., Alsaadi, F.E., Liu, X.: A pso based deep learning approach to classifying patients from emergency departments. Int. J. Mach. Learn. Cyber 73, 529–540 (2021)
Article Google Scholar
Levantesi, S., Nigri, A., Piscopo, G.: Clustering-based simultaneous forecasting of life expectancy time series through long-short term memory neural networks. Int. J. Approx. Reason. 140, 282–297 (2022)
Article MathSciNet MATH Google Scholar
Giovannelli, A., Massacci, D., Soccorsi, S.: Forecasting stock returns with large dimensional factor models. J. Empir. Finance 63, 252–269 (2021)
Article Google Scholar
Cheng, H., Shi, Y.: Forecasting china’s stock market variance. Pacific-Basin Finance J. 64, 101421 (2020)
Article Google Scholar
Dai, Z., Dong, X., Kang, J., Hong, L.: Forecasting stock market returns: New technical indicators and two-step economic constraint method. North Am. J. Econ. Finance 53, 101216 (2020)
Article Google Scholar
Li, H., Bai, J., Li, Y.: A novel secondary decomposition learning paradigm with kernel extreme learning machine for multi-step forecasting of container throughput. Physica A: Stat. Mech. Appl. 534, 122025 (2019)
Article Google Scholar
Xie, G., Zhang, N., Wang, S.: Data characteristic analysis and model selection for container throughput forecasting within a decomposition-ensemble methodology. Transport. Res. Part E Logist Transp. Rev. 108, 160–178 (2017)
W. Chen, H. Zhang, M. K. M. and Lifen Jia, Mean-variance portfolio optimization using machine learning-based stock price prediction. Appl. Soft Comput. J. 100, 106943,(2021)

Download references

Acknowledgements

The authors would like to thank the editors and funding for handling and supporting our paper.

Funding

The work was partly supported by the National Natural Science Foundation of China (No. 72071152), the Youth Innovation Team of Shaanxi Universities (No. 2019), the General Project of Statistical Science Research of National Bureau of Statistics (No. 2021LY078), the Soft Science Research Program of Shaanxi Provincial Department of Science and Technology ( No. 2022KRM114), Key projects of Scientific Research Program of Shaanxi Provincial Department of Education (New think Tank of Colleges and Universities) (No. 21JT035).

Author information

Authors and Affiliations

School of Economics and Management, Xidian University, Xi’an, 710126, China
Youwei Chen, Pengwei Zhao, Juncheng Bai & Yuqi Guo
School of Economics and Management, Xi’an University of Posts and Telecommunications, Xi’an, 710061, China
Youwei Chen
School of Economics and Management, Dalian University of Technology, Dalian, 116024, China
Zhen Zhang

Authors

Youwei Chen
View author publications
You can also search for this author in PubMed Google Scholar
Pengwei Zhao
View author publications
You can also search for this author in PubMed Google Scholar
Zhen Zhang
View author publications
You can also search for this author in PubMed Google Scholar
Juncheng Bai
View author publications
You can also search for this author in PubMed Google Scholar
Yuqi Guo
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

YC completed the main work of this paper under the guidance of PZ, containing methodology and writing original draft. ZZ performed the validation and formal analysis. JB and YG implemented the editing work. All authors read and approved the final manuscript.

Corresponding author

Correspondence to Youwei Chen.

Ethics declarations

Conflict of Interest

All authors declare that they have no conflicts of interest.

Ethical Approval

Yes.

Consent to Participate

Yes.

Consent for Publication

Yes.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Chen, Y., Zhao, P., Zhang, Z. et al. A Stock Price Forecasting Model Integrating Complementary Ensemble Empirical Mode Decomposition and Independent Component Analysis. Int J Comput Intell Syst 15, 75 (2022). https://doi.org/10.1007/s44196-022-00140-2

Download citation

Received: 24 June 2022
Accepted: 01 September 2022
Published: 12 September 2022
DOI: https://doi.org/10.1007/s44196-022-00140-2

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

A Stock Price Forecasting Model Integrating Complementary Ensemble Empirical Mode Decomposition and Independent Component Analysis

Abstract

Similar content being viewed by others

The two-stage machine learning ensemble models for stock price prediction by combining mode decomposition, extreme learning machine and improved harmony search algorithm

A two-stage model for stock price prediction based on variational mode decomposition and ensemble machine learning method

Stock Price Forecasting with Empirical Mode Decomposition Based Ensemble $$\nu $$ -Support Vector Regression Model

1 Introduction

2 Preliminaries

2.1 CEEMD

2.2 ICA

3 CEEMD and ICA-Based Stock Price Forecasting Model

3.1 Decomposition and Reconstruction of Time Series of Stock Price Based on CEEMD and SE

Definition 3.1

3.2 Independent Component Extraction and Prediction

Definition 3.2

3.3 Evaluation Criteria

4 Experiments and Discussions

4.1 Data

4.2 The CEEMD Decomposition and Reconstruction Results

4.3 The ICA Results

4.4 Comparing the Prediction Accuracy

5 Conclusion and Future Work

Availability of data and material

Abbreviations

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of Interest

Ethical Approval

Consent to Participate

Consent for Publication

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation