Using multi-stage data mining technique to build forecast model for Taiwan stocks
Authors
- First Online:
- Received:
- Accepted:
DOI: 10.1007/s00521-011-0628-0
- Cite this article as:
- Huang, C., Chen, P. & Pan, W. Neural Comput & Applic (2012) 21: 2057. doi:10.1007/s00521-011-0628-0
- 3 Citations
- 253 Views
Abstract
Taiwan stock market trend is fast changing. It is affected by not only the individual investors and the three major institutional investors, but also impacted by domestic political and economic situations. Therefore, to precisely grasp the stock market movement, one must build a perfect stock forecast model. In this article, we used a multi-stage optimized stock forecast model to grasp the changing trend of the stock market. First, data of 2 stocks, TSMC and UMC were collected, and then inputted the test data into the genetic programing and built a model to find out the arithmetic expressions. Artificial Fish Swarm Algorithm is used to dynamically adjust the variable factors and constant factors in the arithmetic expressions. Next, we took the error term (ε) in arithmetic expressions to Gray Model Neural Network to make the forecast. Finally, we used the Artificial Fish Swarm Algorithm to dynamically adjust the parameters of the Gray Model Neural Network to enhance the precision of the stock forecast model as a whole. The result showed that the forecast capability of each stage after the optimization process is better than that of its previous stage, and the mixed stock forecast model (GP–AFSA+GMNN–AFSA) in stage 4 greatly enhanced the precision of the forecast.
Keywords
Data miningGenetic programingGrey model neural networkArtificial fish swarm algorithmArithmetic expressions1 Introduction
In recent years, various kinds of financial products, including stocks, futures, bonds, and others, were introduced to the market as the economy strengthened. These financial products also expanded options for personal wealth management. Among these products, listed shares have been introduced for a long time and familiarized by the public. Thus, stock investment has been a major investment tool for personal wealth. Stock investors have been eager to know how to select stocks to gain profit. Generally speaking, there are two most often used analytical methods when one selects stocks to invest: fundamental analysis and technical analysis. Fundamental analysis mainly focuses on the listed companies’ operation and financial status to forecast future profit/loss and one can then select stocks accordingly. Technical analysis focuses on historical stock price movement, which might show a pattern or trend used by investors to forecast the future possibility of the rise/fall of the price, and the investor can then decide whether to invest or not. In this article, we will focus on technical analysis and use a multi-stage optimized stock forecast model to build a stock forecast model.
In the past, there were many researches related to the building of stock forecast models [1, 2]. In this article, we also referred to past literature, collected data of 2 famous listed semiconductor companies, TSMC (2330) and UMC (2303), and used sample data to build the genetic programing (GP Model). Then, we referred to Professor Jabeen’s book [3, 4], focused on the weight of variables (variable factors) in the arithmetic expressions of genetic programing and used the current artificial fish swarm algorithm (AFSA) to make the dynamic adjustment (GP–AFSA Model). Next, we used the gray model neural network, (GMNN) to forecast the error term (ε) in the forecast by GP–AFSA (GP–AFSA+GMNN Model). Finally, we used the Artificial Fish Swarm Algorithm to dynamically adjust the parameters in the Gray Model Neural Network (GP–AFSA+GMNN–AFSA Model). Therefore, we have four stock forecast models in total. We compared the forecast capability of these four models to provide a reference for the investors and researchers to select target stocks.
This article is divided into four sections: Sect. 1 is the research purpose of this article. Section 2 is the papers related to genetic programing, Artificial Fish Swarm Algorithm and Gray Model Neural Network. Section 3 is the sample data and empirical analysis used in this article. Section 4 is conclusions and suggestions.
2 Research method
2.1 Artificial fish swarm algorithm
- 1.
Random food search: Fish normally swims randomly, however, when they find food around, they will swim toward it. We assume their current status is Xi, randomly pick a status Xj within the range of their sense. If Yi < Yj, then we move one step forward toward that direction, if not, then we reselect another status Xj and see if moving forward is justified. If moving forward is not justified after trial of several times, move one step randomly.
- 2.
Group behavior: Fish can normally form a very big group. We assume their current status is Xi and search the number of fish (nf) within the current visible range (Dij < Visable). If nf/N < δ, that means there is more food at the center of the group. If Yi < Yc, move one step forward the center of the group, otherwise go to food search.
- 3.
Following behavior: When one single fish finds out there is plenty of food in one spot, others will soon follow. We assume their current status is Xi, the best neighbor within current visible range is Xmax. If Yi < Ymax and the number nf justifies nf/N < δ, that means there is more food around Xmax and not crowded. Then, move one step forward toward the position of Xmax, otherwise go to food search.
For the design of the Artificial Fish Swarm Algorithm, we first built an individual fish model of (Artificial Fish, AF). The fish will choose a behavior most suitable for itself. The optimized result for the group can be found out through the group or some individual.
2.2 Gray model neural network
In the equation, y_{2}, …, y_{n}, is system input parameter; y_{1} is system output parameter; a, b_{1}, b_{2},…, b_{n−1} are differential equation coefficients.
2.3 Genetic programing and GPOLS
The symbols (+) and (−) are internal nodes, the other end nodes are the group of elements (X1, X2, and 3) defined by questions. The arithmetic expressions corresponding to the tree are X1 + (3-X2). One can refer to Professor Koza’s books that are related to genetic programing. In this article, we used the current Matlab GPOLS toolbox to proceed with the genetic programing model construction. The main idea of this toolbox is using orthogonal least squares algorithm (OLS) to build a GP model, and one of the results is polynomial, which includes the variable’s initial factors and constants. One can refer to Professor Babu and Karthik [10] literature that are related to the application of GPOLS. In the next section, we will use the Artificial Fish Swarm Algorithm to dynamically adjust these factors and constants to enhance the forecast capability of the model. To download Matlab GPOLS tool box, please go to http://www.fmt.vein.hu/softcomp/gp/gpols.html.
3 Empirical research
3.1 Sample data and variables
Statistics of technical indices of TSMC and UMC shares
Stock | Index | X1 | X2 | X3 | X4 | X5 | X6 | X7 |
---|---|---|---|---|---|---|---|---|
TSMC | Max | 72.38 | 88.07 | 5.25 | 151 | 13.99 | 100 | 100 |
Min | 38.17 | 12.27 | 0.09 | −137 | −8.66 | 0 | 0 | |
Avg | 59.739 | 51.529 | 0.973 | 1.339 | 0.027 | 47.863 | 48.083 | |
Std | 7.094 | 15.608 | 0.652 | 9.018 | 2.371 | 31.985 | 19.205 | |
UMC | Max | 21.82 | 92.59 | 35 | 52 | 20.07 | 100 | 100 |
Min | 7.07 | 8.84 | 0.07 | −147 | −11.27 | 0 | 0 | |
Avg | 15.655 | 49.003 | 1.060 | 1.331 | −0.029 | 53.362 | 43.383 | |
Std | 3.754 | 17.354 | 1.326 | 7.771 | 3.135 | 32.266 | 19.912 |
3.2 Use GP and GP–AFSA to construct initial forecast model on closing price
3.3 Use of GMNN and GMNN–AFSA to forecast error of GP–AFSA model
3.4 General comparison of forecast capabilities of the four models
We cross-examine the five groups of TSMC and UMC shares to test stability of the models. We use 5 evaluation indicators for the four models. They are:
Cross-examination of the five evaluation indicators
Stock | Model | RMSE | RTIC | MAE | MAPE | CE |
---|---|---|---|---|---|---|
TSMC | GP | 0.704 | 0.051 | 0.519 | 0.046 | 0.914 |
GP–AFSA | 0.466 | 0.034 | 0.310 | 0.027 | 0.953 | |
GP–AFSA+GMNN | 0.382 | 0.028 | 0.282 | 0.020 | 0.977 | |
GP–AFSA+GMNN–AFSA | 0.184 | 0.021 | 0.223 | 0.018 | 0.985 | |
UMC | GP | 0.588 | 0.092 | 0.664 | 0.044 | 0.918 |
GP–AFSA | 0.290 | 0.057 | 0.471 | 0.028 | 0.943 | |
GP–AFSA+GMNN | 0.238 | 0.045 | 0.260 | 0.023 | 0.960 | |
GP–AFSA+GMNN–AFSA | 0.169 | 0.023 | 0.196 | 0.020 | 0.979 |
From the Table 2, we can see, for TSMC, that GP–AFSA+GMNN–AFSA forecast model‘s RMSE is 0.184, RTIC 0.021, MAE 0.223, MAPE 0.018, all of which are lower than those in GP, GP–AFSA, and GP–AFSA+GMNN. Its CE is 0.958, which is higher than that in the other three models. Moreover, for UMC, GP–AFSA+GMNN–AFSA forecast model’s RMSE is 0.169, RTIC 0.023, MAE 0.196, MAPE 0.020, all of which are lower than those in GP, GP–AFSA and GP–AFSA+GMNN. Its CE is 0.979, which is higher than that in the other three models. Therefore, GP–AFSA+GMNN–AFSA, a multi-stage stock forecast model, is better than the other four models in terms of forecast capabilities.
4 Conclusions and suggestion
As there many factors affecting Taiwan stocks, closing prices of stocks are highly random. Therefore, a closing price forecast model should be as precise as possible. This article mainly focuses on how to use a multi-stage optimized stock forecast model and input moderner data mining techniques to build a forecast model as reference for researchers. In Table 2, one can find that forecast capability of each stage after optimization process is better than that of its previous stage. And the stock forecast model (GP–AFSA+GMNN–AFSA) in stage 4 can really greatly enhance precision of forecast.
In addition, we use AFSA to optimize a forecast model in the article. In the future, we suggest other algorithms, such as Professor Eberhart and Kennedy [12] Particle Swarm Optimization or Professor Teodorovic’s [13–15] Bee Colony Optimization to optimize a model.