Keywords

1 Introduction

The stock price of a company is an important criterion for measuring the actual value of the company. In the stock market, well decision depends on well forecast. Due to the development of computational technology and intelligence technologies. New tools are recently developed to process information on stock forecast. The analysis of financial market movements has been widely studied in the fields of finance, engineering and mathematics in the last decades [15]. Using intelligent technologies on stock prediction has widely spread in recent years.

In the past, the most commonly algorithms for stock forecast in the past are artificial neural network (ANN) and support vector machine (SVM or SVR). In contract to ANN, SVM is a statistical learning method that is widely used in pattern recognition tasks. In 2003, Kim et al. predicted stock price by using a SVR model and proved that the prediction precision of support vector regression model was better than the back propagation (BP) neural network prediction model and case-based reasoning (CBR) [11]. In 2006, Xu et al. came up with a revised Least squares (LS)-SVM model and forecasted Nasdaq Index movement, and model brought satisfactory results [16].

In the past several years, there are many business applications, in which the technology of ANN was used. It may be due to the non-linear approximation ability of ANN, and it is frequently used in combination with other methods. Martinex et al. proposed that the neural network to solve the financial forecasting problem use mostly a back propagation algorithm to optimize a multi-layer forward neural network (MLP) with high performance [13].

In recent years, ensemble learning and deep learning have developed quickly in a lot of fields [4, 5, 8, 12, 14]. These two methods have their own advantages and disadvantages in solving with stock data. In general, there are two kinds of different views in stock prediction. (i) To obtain the enough information, financial analysis methods must be used to obtain high quality information. In this case, market economy data involves the indices with different characters, which are suitable to be handed by ensemble learning. However, simplifies computation of analysis methods would causes loss of information. (ii) Only use the stock price history to get information. It is possible that all the information is available from the historical behavior of a financial asset as a time series. The time series of stock prices involves enough information and is suitable to be handed by deep learning. However, the time series of stock prices also involves a lot of noise and uncertainties. In order to improve forecast accuracy we desire to use technique indices and retain the sequence structure of stock data to make them complementary to each other. Thus, a model combines the advantages of both ensemble learning and deep learning is the objective of this paper. This paper proposes a model combining the ensemble techniques of extreme gradient boost (XGBoost) [4] and 1D convolution neural network (CNN), called as CGBoost, to obtain a better performance.

In addition, we use a sparse autoencodes (SAEs) [3, 9] to process data to reduce noise in the stock price time series, the training implemented by encoding and decoding data and reducing the loss in each iteration. If we only and use the original data to train CGBoost without this process, we will get only a highly overfitting result. Then we also try to training one model on several different market indexes, so as to test whether the propose model can unify data from different market indexes to improve overall performance.

The remainder of this paper contains three sections. Section 2 draws the details of each technique used in this work and how to combine all those techniques as a complete system. Section 3 describes data resources, evaluation and other details about the experiment. The results and analysis of the experiment is also in this section. Finally, Sect. 4 draws conclusion and future work.

Fig. 1.
figure 1

Stock prediction system contains sparse autoencodes and convolution gradient boost.

2 Methology

In order to generate the deep and invariant features for one-step-ahead stock price forecast, this paper presents a gradient boosting framework with a deep learning for financial time series. The framework uses a deep (CNN) and width (GBoost) learning-based predicting scheme that integrates the architecture of CNN and GBoost. The flow chart of this framework shown in Fig. 1, involves three stages: (1) data preprocessing, the clipping and normalizing transform, which are applied to rescale the stock price time series to some scale; (2) adopting of the SAEs, which has a deep architecture trained in an unsupervised manner, combined with 1D CNN; and (3) GBoost, it has 1D CNN to generate the one-step-ahead prediction. Since the first step is related to data descriptions, details the first step are introduced in the Sect. 3. The rest steps are detailed as follows.

2.1 SAEs Training and Denoise

SAEs is a type of deep learning model to reduce dimension and noise of data [3, 9]. Since manually adding category tags to data is a very cumbersome process, the machine must learn part of the important features in the sample. By imposing some restrictions on the hidden layer, SAEs can better express the characteristics of the sample in a harsh environment. SAEs has this limitation on the sparseness of the hidden layer.

The sparsity is represented as the activated states of neuron. If the sigmoid function is used as the activation function, and the neuron output value is 0, this situation is regarded as a suppression. The sparsity limit ensures that most of the neuron output is 0 and the state is suppressed. Then, the functions can approximate,

$$\begin{aligned} \hat{\rho }=\rho ,\ \hat{\rho }=\frac{1}{m}\sum ^m_{i=1}\left[ a_j(x^{(i)})\right] \end{aligned}$$
(1)

where \(a_j\) denotes the activation of the hidden neuron j; \(\hat{\rho }\) is the average of the activation; and \(\rho \) is a sparsity parameter, usually it is a small value close to 0 (such as \(\rho = 0.05\)).

In order to achieve this limitation, an additional penalty factor is added to our optimization objective function, which can punish those \(\hat{\rho }_j\) has significantly different conditions with \(\rho \) in hidden layers, it is given by:

$$\begin{aligned} \sum _{j=1}^{s_2} \mathrm{KL}(\rho || \hat{\rho }_j)= \sum _{j=1}^{s_2} \rho \log \frac{\rho }{\hat{\rho }_j} + (1-\rho ) \log \frac{1-\rho }{ 1-\hat{\rho }_j}, \end{aligned}$$
(2)

where \(s_2\) is the number of hidden neurons in the hidden layer; and the index j in turn represents each neuron in the hidden layer. Then, the overall loss function is expressed as:

$$\begin{aligned} J_\mathrm{sparse}(W,b) = J(W,b) + \beta \sum _{j=1}^{s_2} \mathrm{KL}(\rho || \hat{\rho }_j), \end{aligned}$$
(3)

where J(Wb) represents the reconstruction loss; \(\beta \) controls the weight of the sparsity penalty factor, W and b are weight and bias of neural network, respectively.

Finally, we apply stochastic gradient descend to optimize W and b. In order to minimize \(J_\mathrm{sparse}(W,b)\). After training SAEs, \(\mathrm{a}(x^{(i)})={a_j(x^{(i)})}_j\) is used as the feature of sample \(\{x^{(i)},y^{(i)}\}\).

2.2 CGBoost Training and Forecast

The gradient boost algorithm is an ensemble learning technology. The algorithm generates a prediction model by integrating weak prediction models, such as decision trees. It builds the model in a step-by-step manner like other gradient methods and promotes them by allowing the use of any differentiable loss function.

In the experiments, rather than using original GBoost, we apply the training way in XGBoost [4]. Different from the traditional GBoost method, only the first derivative information is used. XGBoost performs the second-order Taylor expansion on the loss function, and adds the regular term to the objective function to balance the decline of the objective function, so as to avoid overfitting. The objective function of the based learner is given by:

$$\begin{aligned} Obj^{(t)}\approx \sum ^n_{i=1}\left[ g_if_t(x_i)+\frac{1}{2}h_if^2_t(x_i)\right] +\varOmega (f_t), \end{aligned}$$
(4)

where \(\varOmega (f_t)\) is the L2 regularization \(\sum _l \Vert W_l\Vert ^2\), due to all based estimators are CNNs, \(W_l\) denotes the weights of l layer, \(g_i=\partial l(y_i,y^{(t-1)_i})/\partial y^{(t-1)}_i\) and \(h_i=\partial ^2 l(y_i,y^{(t-1)_i})/\partial {y^{(t-1)}_i}^2\). Because the goal is to predict the real price of stock the loss function can be square loss function. Then, the form is given by:

$$\begin{aligned} Obj^{(t)}\approx \sum ^n_{i=1}\left[ 2(y_i^{(t-1)}-y_i)f_t(x_i)+f^2_t(x_i)\right] +\varOmega (f_t),\ t\ne 1 \end{aligned}$$
(5)

when all based estimators are obtained, the forecast result of \(x_i\) is calculated by \(F(x_i)=\sum ^T_{t=1}f_t(x_i)\), where T denotes the number of based estimators.

Fig. 2.
figure 2

Ensemble model with neural network based learners.

2.3 1D Residual Network

In this paper, we use 1D residual neural network (resnet) [8], a kind of CNN, within both SAEs and GBoost. Due to the common size of our model we use resnet to train well. However, using resnet still can accelerate training significantly [8].

A structural diagram of a standard 1D CNN is shown in Fig. 3. Each layer receives the output from the previous layer and outputs abstract features. In the training, gradient is back propagated from the output of last layer. The level number of network is larger than a certain number, the gradient vanishing will occur, so as to make deep network to be trained difficultly.

The resnet applies the idea of the “shortcut connections”, the idea of cross-layer linking to improve it, in order to prevent the gradient vanishing. The input x is directly passed to the output as the initial result, and the output result is \(H(x)=F(x)+x\). If \(F(x)=0\), H(x) will become the identity map, \(H(x)=x\). As the network deepening, it still retain the much shallower tunnel. Therefore, the gradient does not decrease as the network deepening.

Fig. 3.
figure 3

A standard 1D convolutional neural network diagram.

3 Experiment

The experiments are designed to answer two questions: (1) Can the model combine ensemble learning and deep learning produce more accurate predictions than single deep learning model? (2) Is proposed model able to fit the data from different indices and still improve the performance?

The proposed model compares with the accuracy of WSAEs-LSTM [2], which applied the deep learning model to forecast the stock price series, so as to answer the first question. Following [2], we chose “CSI 300”, “DJIA”, “Hang Seng”, “Nifty 50”, “Nikkei 225” and “S&P500” indices as the predict goals. We conducted experiments training one model for each index and training one model for all indexes, and CGBoost and CGBoost6 were applied to denote them respectively. Their results can answer the second question.

It is different from Fig. 1 we use a fixed number of base models in CGBoost. The reason is that we train the model on training data and validation data after adjusting hyper-parameter. Thus not validation data can be used to test whether the model is improved or not. Besides, the model is used to predict stock price indirectly by predicting the change rate of price. Based on our experience, this way can get better results.

3.1 Data Descriptions

The data used in this experiment is detailed as follows.

Data Resource. We use the data provided by [2], which was sample from CSMARFootnote 1 and WINDFootnote 2. The sample is from \(1^\mathrm{st}\) Jul. 2008 to \(30^\mathrm{th}\) Sep.2016.

Table 1. The techniques indices and their definition is described in this table.

Data Features. Three types of feature are chose in our experiment. Following the previous literature the first type of feature includes OHLC variables, which is the price variables (Open, High, Low, and Close price). The second kind of feature is the technical indicators of each index. Each of them is described in Table 1. The final part of inputs is the macroeconomic variable. It is related to stock price. We chose the Interbank Offered Rate and US dollar Index to our system.

Data Divide. Refer to the rule of stock, we cannot use the data from future. Thus we use the first two year as the training set, next three months as the validating data and last three months as the test set. It is divided into four steps to obtain a one-year prediction result for testing. We divide prediction into six years to evaluate accuracy.

3.2 Evaluate

Following [1, 6, 7, 10], the results were evaluated by “MAPE”, “Theil U” and “linear correlation between prediction and real prices” (use R to denote). These indicators are denoted as follows:

$$\begin{aligned} \begin{aligned} \mathrm {MAPE}=\frac{1}{N}\sum ^N_{t=1}\left| \frac{y_t-y_t^*}{y_t}\right| \end{aligned} \end{aligned}$$
(6)
$$\begin{aligned} \begin{aligned} \mathrm {R}=\frac{\sum ^N_{t=1}(y_t-\overline{y_t})(y^*_t-\overline{y^*_t})}{\sqrt{\sum ^N_{t=1}(y_t-\overline{y_t})^2\sum ^N_{t=1}(y^*_t-\overline{y^*_t})^2}} \end{aligned} \end{aligned}$$
(7)
$$\begin{aligned} \begin{aligned} \mathrm {Theil\ U}=\frac{\sqrt{\frac{1}{N}\sum ^N_{t=1}(y_t-y_t^*)^2}}{\sqrt{\frac{1}{N}\sum ^N_{t=1}(y_t)^2}+\sqrt{\frac{1}{N}\sum ^N_{t=1}(y^*_t)^2}} \end{aligned} \end{aligned}$$
(8)

where \(y^*_t\) is the forecast of model and \(y_t\) is the actual price on time t. N is the number of prediction, in our experiment it is the number of days open in a year. R is different from MAPE and Theil U, if R is larger, the predicting price is similar to the actual value.

Table 2. The prediction accuracy in CSI 300 and DJIA indices.
Table 3. The prediction accuracy in HangSeng and Nifty 50 indices.
Table 4. The prediction accuracy in Nikkei 225 and S&P500 indices.
Fig. 4.
figure 4

Shows the actual curves and predicted curves from the our methods for six stock index from 2010.10.01 to 2011.09.30.

3.3 Results

The proposed method has improved results significantly. As show in Tables 2, 3 and 4, both cgboost and cgboost6 have low average predicting error in each year and each index, both MAPE and Theil U, and predicting result has higher linear correlation with actual price than based experiment. This result proved that proposed model can introduce a more accurate prediction than deep learning. Besides, the result of CGBoost6 is much better than that CGBoost, it is also answer the second question.

Figure 4 shows an example of the Year 1 predicted price from proposed model and the corresponding actual price. CGBoost6 is closer to the actual stock price time series than CGBoost and has lower volatility.

4 Conclusion and Future Work

In this paper we built a new predicting framework to forecast the next day stock price of six stock indices from the financial markets from different country. The process for building this predicting framework is: First, clipping the high value and normalizing the technical index and other feature. Second, using 1D resnet SAEs to denoise and reduce the dimension of features. Last, CGBoost was used to predict the next day price, this is a supervised manner. Our input features include the daily technical indicators, OHLC variables and macroeconomic variables. The main contribution of this paper is attempting to combine 1D resnet with GBoost, a kind of ensemble learning method in stock predict, and prove its performance. Besides, we successfully trains one model on different market and obtain a better prediction on the overall test set.

Future work could focus on increasing the diversity of based estimators. We may try to replace same construction of basic 1D CNNs with several different constructions, in order to improve the performance of CGBoost. Another interesting direction is to use CGBoost in other fields. CGBoost may be applicable to the sequence data including several time series of different features, such as weather forecast, traffic forecast and etc. CGBoost may be able to get better performance in these field.