Gradient Boost with Convolution Neural Network for Stock Forecast

Liu, Jialin; Lin, Chih-Min Min; Chao, Fei

doi:10.1007/978-3-030-29933-0_13

Jialin Liu¹⁹,
Chih-Min Min Lin²⁰ &
Fei Chao¹⁹

Part of the book series: Advances in Intelligent Systems and Computing ((AISC,volume 1043))

Included in the following conference series:

UK Workshop on Computational Intelligence

966 Accesses
4 Citations

Abstract

Market economy closely connects aspects to all walks of life. The stock forecast is one of task among studies on the market economy. However, information on markets economy contains a lot of noise and uncertainties, which lead economy forecasting to become a challenging task. Ensemble learning and deep learning are the most methods to solve the stock forecast task. In this paper, we present a model combining the advantages of two methods to forecast the change of stock price. The proposed method combines CNN and GBoost. The experimental results on six market indexes show that the proposed method has better performance against current popular methods.

Download conference paper PDF

Predicting Stock Price Movement with Multiple Data Sources and Machine Learning Models

Using Deep Neural Networks for Stock Market Data Forecasting: An Effectiveness Comparative Study

Convolutional Feature Extraction and Neural Arithmetic Logic Units for Stock Prediction

Keywords

1 Introduction

The stock price of a company is an important criterion for measuring the actual value of the company. In the stock market, well decision depends on well forecast. Due to the development of computational technology and intelligence technologies. New tools are recently developed to process information on stock forecast. The analysis of financial market movements has been widely studied in the fields of finance, engineering and mathematics in the last decades [15]. Using intelligent technologies on stock prediction has widely spread in recent years.

In the past, the most commonly algorithms for stock forecast in the past are artificial neural network (ANN) and support vector machine (SVM or SVR). In contract to ANN, SVM is a statistical learning method that is widely used in pattern recognition tasks. In 2003, Kim et al. predicted stock price by using a SVR model and proved that the prediction precision of support vector regression model was better than the back propagation (BP) neural network prediction model and case-based reasoning (CBR) [11]. In 2006, Xu et al. came up with a revised Least squares (LS)-SVM model and forecasted Nasdaq Index movement, and model brought satisfactory results [16].

In the past several years, there are many business applications, in which the technology of ANN was used. It may be due to the non-linear approximation ability of ANN, and it is frequently used in combination with other methods. Martinex et al. proposed that the neural network to solve the financial forecasting problem use mostly a back propagation algorithm to optimize a multi-layer forward neural network (MLP) with high performance [13].

In recent years, ensemble learning and deep learning have developed quickly in a lot of fields [4, 5, 8, 12, 14]. These two methods have their own advantages and disadvantages in solving with stock data. In general, there are two kinds of different views in stock prediction. (i) To obtain the enough information, financial analysis methods must be used to obtain high quality information. In this case, market economy data involves the indices with different characters, which are suitable to be handed by ensemble learning. However, simplifies computation of analysis methods would causes loss of information. (ii) Only use the stock price history to get information. It is possible that all the information is available from the historical behavior of a financial asset as a time series. The time series of stock prices involves enough information and is suitable to be handed by deep learning. However, the time series of stock prices also involves a lot of noise and uncertainties. In order to improve forecast accuracy we desire to use technique indices and retain the sequence structure of stock data to make them complementary to each other. Thus, a model combines the advantages of both ensemble learning and deep learning is the objective of this paper. This paper proposes a model combining the ensemble techniques of extreme gradient boost (XGBoost) [4] and 1D convolution neural network (CNN), called as CGBoost, to obtain a better performance.

In addition, we use a sparse autoencodes (SAEs) [3, 9] to process data to reduce noise in the stock price time series, the training implemented by encoding and decoding data and reducing the loss in each iteration. If we only and use the original data to train CGBoost without this process, we will get only a highly overfitting result. Then we also try to training one model on several different market indexes, so as to test whether the propose model can unify data from different market indexes to improve overall performance.

The remainder of this paper contains three sections. Section 2 draws the details of each technique used in this work and how to combine all those techniques as a complete system. Section 3 describes data resources, evaluation and other details about the experiment. The results and analysis of the experiment is also in this section. Finally, Sect. 4 draws conclusion and future work.

2 Methology

In order to generate the deep and invariant features for one-step-ahead stock price forecast, this paper presents a gradient boosting framework with a deep learning for financial time series. The framework uses a deep (CNN) and width (GBoost) learning-based predicting scheme that integrates the architecture of CNN and GBoost. The flow chart of this framework shown in Fig. 1, involves three stages: (1) data preprocessing, the clipping and normalizing transform, which are applied to rescale the stock price time series to some scale; (2) adopting of the SAEs, which has a deep architecture trained in an unsupervised manner, combined with 1D CNN; and (3) GBoost, it has 1D CNN to generate the one-step-ahead prediction. Since the first step is related to data descriptions, details the first step are introduced in the Sect. 3. The rest steps are detailed as follows.

2.1 SAEs Training and Denoise

SAEs is a type of deep learning model to reduce dimension and noise of data [3, 9]. Since manually adding category tags to data is a very cumbersome process, the machine must learn part of the important features in the sample. By imposing some restrictions on the hidden layer, SAEs can better express the characteristics of the sample in a harsh environment. SAEs has this limitation on the sparseness of the hidden layer.

The sparsity is represented as the activated states of neuron. If the sigmoid function is used as the activation function, and the neuron output value is 0, this situation is regarded as a suppression. The sparsity limit ensures that most of the neuron output is 0 and the state is suppressed. Then, the functions can approximate,

$$\begin{aligned} \hat{\rho }=\rho ,\ \hat{\rho }=\frac{1}{m}\sum ^m_{i=1}\left[ a_j(x^{(i)})\right] \end{aligned}$$

(1)

where $a_j$ denotes the activation of the hidden neuron j; $\hat{\rho }$ is the average of the activation; and $\rho $ is a sparsity parameter, usually it is a small value close to 0 (such as $\rho = 0.05$).

In order to achieve this limitation, an additional penalty factor is added to our optimization objective function, which can punish those $\hat{\rho }_j$ has significantly different conditions with $\rho $ in hidden layers, it is given by:

$$\begin{aligned} \sum _{j=1}^{s_2} \mathrm{KL}(\rho || \hat{\rho }_j)= \sum _{j=1}^{s_2} \rho \log \frac{\rho }{\hat{\rho }_j} + (1-\rho ) \log \frac{1-\rho }{ 1-\hat{\rho }_j}, \end{aligned}$$

(2)

where $s_2$ is the number of hidden neurons in the hidden layer; and the index j in turn represents each neuron in the hidden layer. Then, the overall loss function is expressed as:

$$\begin{aligned} J_\mathrm{sparse}(W,b) = J(W,b) + \beta \sum _{j=1}^{s_2} \mathrm{KL}(\rho || \hat{\rho }_j), \end{aligned}$$

(3)

where J(W, b) represents the reconstruction loss; $\beta $ controls the weight of the sparsity penalty factor, W and b are weight and bias of neural network, respectively.

Finally, we apply stochastic gradient descend to optimize W and b. In order to minimize $J_\mathrm{sparse}(W,b)$. After training SAEs, $\mathrm{a}(x^{(i)})={a_j(x^{(i)})}_j$ is used as the feature of sample $\{x^{(i)},y^{(i)}\}$.

2.2 CGBoost Training and Forecast

The gradient boost algorithm is an ensemble learning technology. The algorithm generates a prediction model by integrating weak prediction models, such as decision trees. It builds the model in a step-by-step manner like other gradient methods and promotes them by allowing the use of any differentiable loss function.

In the experiments, rather than using original GBoost, we apply the training way in XGBoost [4]. Different from the traditional GBoost method, only the first derivative information is used. XGBoost performs the second-order Taylor expansion on the loss function, and adds the regular term to the objective function to balance the decline of the objective function, so as to avoid overfitting. The objective function of the based learner is given by:

$$\begin{aligned} Obj^{(t)}\approx \sum ^n_{i=1}\left[ g_if_t(x_i)+\frac{1}{2}h_if^2_t(x_i)\right] +\varOmega (f_t), \end{aligned}$$

(4)

where $\varOmega (f_t)$ is the L2 regularization $\sum _l \Vert W_l\Vert ^2$, due to all based estimators are CNNs, $W_l$ denotes the weights of l layer, $g_i=\partial l(y_i,y^{(t-1)_i})/\partial y^{(t-1)}_i$ and $h_i=\partial ^2 l(y_i,y^{(t-1)_i})/\partial {y^{(t-1)}_i}^2$. Because the goal is to predict the real price of stock the loss function can be square loss function. Then, the form is given by:

$$\begin{aligned} Obj^{(t)}\approx \sum ^n_{i=1}\left[ 2(y_i^{(t-1)}-y_i)f_t(x_i)+f^2_t(x_i)\right] +\varOmega (f_t),\ t\ne 1 \end{aligned}$$

(5)

when all based estimators are obtained, the forecast result of $x_i$ is calculated by $F(x_i)=\sum ^T_{t=1}f_t(x_i)$, where T denotes the number of based estimators.

2.3 1D Residual Network

In this paper, we use 1D residual neural network (resnet) [8], a kind of CNN, within both SAEs and GBoost. Due to the common size of our model we use resnet to train well. However, using resnet still can accelerate training significantly [8].

A structural diagram of a standard 1D CNN is shown in Fig. 3. Each layer receives the output from the previous layer and outputs abstract features. In the training, gradient is back propagated from the output of last layer. The level number of network is larger than a certain number, the gradient vanishing will occur, so as to make deep network to be trained difficultly.

The resnet applies the idea of the “shortcut connections”, the idea of cross-layer linking to improve it, in order to prevent the gradient vanishing. The input x is directly passed to the output as the initial result, and the output result is $H(x)=F(x)+x$. If $F(x)=0$, H(x) will become the identity map, $H(x)=x$. As the network deepening, it still retain the much shallower tunnel. Therefore, the gradient does not decrease as the network deepening.

3 Experiment

The experiments are designed to answer two questions: (1) Can the model combine ensemble learning and deep learning produce more accurate predictions than single deep learning model? (2) Is proposed model able to fit the data from different indices and still improve the performance?

The proposed model compares with the accuracy of WSAEs-LSTM [2], which applied the deep learning model to forecast the stock price series, so as to answer the first question. Following [2], we chose “CSI 300”, “DJIA”, “Hang Seng”, “Nifty 50”, “Nikkei 225” and “S&P500” indices as the predict goals. We conducted experiments training one model for each index and training one model for all indexes, and CGBoost and CGBoost6 were applied to denote them respectively. Their results can answer the second question.

It is different from Fig. 1 we use a fixed number of base models in CGBoost. The reason is that we train the model on training data and validation data after adjusting hyper-parameter. Thus not validation data can be used to test whether the model is improved or not. Besides, the model is used to predict stock price indirectly by predicting the change rate of price. Based on our experience, this way can get better results.

3.1 Data Descriptions

The data used in this experiment is detailed as follows.

Data Resource. We use the data provided by [2], which was sample from CSMAR^{Footnote 1} and WIND^{Footnote 2}. The sample is from $1^\mathrm{st}$ Jul. 2008 to $30^\mathrm{th}$ Sep.2016.

Table 1. The techniques indices and their definition is described in this table.

Full size table

Data Features. Three types of feature are chose in our experiment. Following the previous literature the first type of feature includes OHLC variables, which is the price variables (Open, High, Low, and Close price). The second kind of feature is the technical indicators of each index. Each of them is described in Table 1. The final part of inputs is the macroeconomic variable. It is related to stock price. We chose the Interbank Offered Rate and US dollar Index to our system.

Data Divide. Refer to the rule of stock, we cannot use the data from future. Thus we use the first two year as the training set, next three months as the validating data and last three months as the test set. It is divided into four steps to obtain a one-year prediction result for testing. We divide prediction into six years to evaluate accuracy.

3.2 Evaluate

Following [1, 6, 7, 10], the results were evaluated by “MAPE”, “Theil U” and “linear correlation between prediction and real prices” (use R to denote). These indicators are denoted as follows:

$$\begin{aligned} \begin{aligned} \mathrm {MAPE}=\frac{1}{N}\sum ^N_{t=1}\left| \frac{y_t-y_t^*}{y_t}\right| \end{aligned} \end{aligned}$$

(6)

$$\begin{aligned} \begin{aligned} \mathrm {R}=\frac{\sum ^N_{t=1}(y_t-\overline{y_t})(y^*_t-\overline{y^*_t})}{\sqrt{\sum ^N_{t=1}(y_t-\overline{y_t})^2\sum ^N_{t=1}(y^*_t-\overline{y^*_t})^2}} \end{aligned} \end{aligned}$$

(7)

$$\begin{aligned} \begin{aligned} \mathrm {Theil\ U}=\frac{\sqrt{\frac{1}{N}\sum ^N_{t=1}(y_t-y_t^*)^2}}{\sqrt{\frac{1}{N}\sum ^N_{t=1}(y_t)^2}+\sqrt{\frac{1}{N}\sum ^N_{t=1}(y^*_t)^2}} \end{aligned} \end{aligned}$$

(8)

where $y^*_t$ is the forecast of model and $y_t$ is the actual price on time t. N is the number of prediction, in our experiment it is the number of days open in a year. R is different from MAPE and Theil U, if R is larger, the predicting price is similar to the actual value.

Table 2. The prediction accuracy in CSI 300 and DJIA indices.

Full size table

Table 3. The prediction accuracy in HangSeng and Nifty 50 indices.

Full size table

Table 4. The prediction accuracy in Nikkei 225 and S&P500 indices.

Full size table

3.3 Results

The proposed method has improved results significantly. As show in Tables 2, 3 and 4, both cgboost and cgboost6 have low average predicting error in each year and each index, both MAPE and Theil U, and predicting result has higher linear correlation with actual price than based experiment. This result proved that proposed model can introduce a more accurate prediction than deep learning. Besides, the result of CGBoost6 is much better than that CGBoost, it is also answer the second question.

Figure 4 shows an example of the Year 1 predicted price from proposed model and the corresponding actual price. CGBoost6 is closer to the actual stock price time series than CGBoost and has lower volatility.

4 Conclusion and Future Work

In this paper we built a new predicting framework to forecast the next day stock price of six stock indices from the financial markets from different country. The process for building this predicting framework is: First, clipping the high value and normalizing the technical index and other feature. Second, using 1D resnet SAEs to denoise and reduce the dimension of features. Last, CGBoost was used to predict the next day price, this is a supervised manner. Our input features include the daily technical indicators, OHLC variables and macroeconomic variables. The main contribution of this paper is attempting to combine 1D resnet with GBoost, a kind of ensemble learning method in stock predict, and prove its performance. Besides, we successfully trains one model on different market and obtain a better prediction on the overall test set.

Future work could focus on increasing the diversity of based estimators. We may try to replace same construction of basic 1D CNNs with several different constructions, in order to improve the performance of CGBoost. Another interesting direction is to use CGBoost in other fields. CGBoost may be applicable to the sequence data including several time series of different features, such as weather forecast, traffic forecast and etc. CGBoost may be able to get better performance in these field.

Notes

1.
http://www.gtarsc.com.
2.
http://www.wind.com.cn.

References

Altay, E., Satman, M.H.: Stock market forecasting: artificial neural network and linear regression comparison in an emerging market. J. Financ. Manag. Anal. 18(2), 18 (2005)
Google Scholar
Bao, W., Yue, J., Rao, Y.: A deep learning framework for financial time series using stacked autoencoders and long-short term memory. PloS one 12(7), e0180,944 (2017)
Article Google Scholar
Bengio, Y., Lamblin, P., Popovici, D., Larochelle, H.: Greedy layer-wise training of deep networks. In: Advances in Neural Information Processing Systems, pp. 153–160 (2007)
Google Scholar
Chen, T., Guestrin, C.: XGBoost: a scalable tree boosting system. In: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 785–794. ACM (2016)
Google Scholar
Diao, R., Chao, F., Peng, T., Snooke, N., Shen, Q.: Feature selection inspired classifier ensemble reduction. IEEE Trans. Cybern. 44(8), 1259–1268 (2013)
Article Google Scholar
Emenike, K.O.: Forecasting Nigerian stock exchange returns: evidence from autoregressive integrated moving average (ARIMA) model. SSRN Electron. J. 2010, 1–19 (2010)
Google Scholar
Guo, Z., Wang, H., Liu, Q., Yang, J.: A feature fusion based forecasting model for financial time series. PloS One 9(6), e101,113 (2014)
Article Google Scholar
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)
Google Scholar
Hinton, G.E., Salakhutdinov, R.R.: Reducing the dimensionality of data with neural networks. Science 313(5786), 504–507 (2006)
Article MathSciNet Google Scholar
Hsieh, T.J., Hsiao, H.F., Yeh, W.C.: Forecasting stock markets using wavelet transforms and recurrent neural networks: an integrated system based on artificial bee colony algorithm. Appl. Soft Comput. 11(2), 2510–2525 (2011)
Article Google Scholar
Kim, K.J.: Toward global optimization of case-based reasoning systems for financial forecasting. Appl. Intell. 21(3), 239–249 (2004)
Article Google Scholar
Lawrence, R.: Using neural networks to forecast stock market prices. University of Manitoba 333 (1997)
Google Scholar
Martinez, L.C., da Hora, D.N., Palotti, J.R.d.M., Meira, W., Pappa, G.L.: From an artificial neural network to a stock market day-trading system: a case study on the BM&F BOVESPA. In: 2009 International Joint Conference on Neural Networks, pp. 2006–2013. IEEE (2009)
Google Scholar
Nassirtoussi, A.K., Aghabozorgi, S., Wah, T.Y., Ngo, D.C.L.: Text mining of news-headlines for forex market prediction: a multi-layer dimension reduction algorithm with semantics and sentiment. Expert Syst. Appl. 42(1), 306–324 (2015)
Article Google Scholar
Porshnev, A., Redkin, I., Shevchenko, A.: Machine learning in prediction of stock market indicators based on historical data and data from Twitter sentiment analysis. In: 2013 IEEE 13th International Conference on Data Mining Workshops, pp. 440–444. IEEE (2013)
Google Scholar
Rui-Rui, X., Tian-Lun, C., Cheng-Feng, G.: Nonlinear time series prediction using ls-svm with chaotic mutation evolutionary programming for parameter optimization. Commun. Theor. Phys. 45(4), 641 (2006)
Article Google Scholar

Download references

Acknowledgment

This work was supported by the National Natural Science Foundation of China (No. 61673322, 61673326, and 91746103), the Fundamental Research Funds for the Central Universities (No. 20720190142), Natural Science Foundation of Fujian Province of China (No. 2017J01128 and 2017J01129), and the European Union’s Horizon 2020 research and innovation programme under the Marie Sklodowska-Curie grant agreement (No. 663830).

Author information

Authors and Affiliations

Department of Artificial Intelligence, School of Informatics, Xiamen University, Xiamen, 361005, Fujian, People’s Republic of China
Jialin Liu & Fei Chao
Department of Electrical Engineering, Yuan Ze University, Chung-Li, Tao-Yuan, 320, Taiwan
Chih-Min Min Lin

Authors

Jialin Liu
View author publications
You can also search for this author in PubMed Google Scholar
Chih-Min Min Lin
View author publications
You can also search for this author in PubMed Google Scholar
Fei Chao
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Fei Chao .

Editor information

Editors and Affiliations

School of Computing, University of Portsmouth, Portsmouth, UK
Zhaojie Ju
Department of Computer and Information Sciences, Northumbria University, Newcastle upon Tyne, UK
Longzhi Yang
Bristol Robotics Laboratory, University of the West of England, Bristol, UK
Chenguang Yang
School of Computing, University of Portsmouth, Portsmouth, Hampshire, UK
Alexander Gegov
School of Computing, University of Portsmouth, Portsmouth, UK
Dalin Zhou

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Liu, J., Lin, CM.M., Chao, F. (2020). Gradient Boost with Convolution Neural Network for Stock Forecast. In: Ju, Z., Yang, L., Yang, C., Gegov, A., Zhou, D. (eds) Advances in Computational Intelligence Systems. UKCI 2019. Advances in Intelligent Systems and Computing, vol 1043. Springer, Cham. https://doi.org/10.1007/978-3-030-29933-0_13

Download citation

DOI: https://doi.org/10.1007/978-3-030-29933-0_13
Published: 30 August 2019
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-29932-3
Online ISBN: 978-3-030-29933-0
eBook Packages: Intelligent Technologies and RoboticsIntelligent Technologies and Robotics (R0)

Publish with us

Policies and ethics

Gradient Boost with Convolution Neural Network for Stock Forecast

Abstract

Similar content being viewed by others

Predicting Stock Price Movement with Multiple Data Sources and Machine Learning Models

Using Deep Neural Networks for Stock Market Data Forecasting: An Effectiveness Comparative Study

Convolutional Feature Extraction and Neural Arithmetic Logic Units for Stock Prediction

Keywords

1 Introduction