Forecasting of financial data: a novel fuzzy logic neural network based on error-correction concept and statistics
Abstract
First, this paper investigates the effect of good and bad news on volatility in the BUX return time series using asymmetric ARCH models. Then, the accuracy of forecasting models based on statistical (stochastic), machine learning methods, and soft/granular RBF network is investigated. To forecast the high-frequency financial data, we apply statistical ARMA and asymmetric GARCH-class models. A novel RBF network architecture is proposed based on incorporation of an error-correction mechanism, which improves forecasting ability of feed-forward neural networks. These proposed modelling approaches and SVM models are applied to predict the high-frequency time series of the BUX stock index. We found that it is possible to enhance forecast accuracy and achieve significant risk reduction in managerial decision making by applying intelligent forecasting models based on latest information technologies. On the other hand, we showed that statistical GARCH-class models can identify the presence of leverage effects, and react to the good and bad news.
Keywords
RBF neural networks Support vector machines ARMA/GARCH models Volatility modelling Error-correction mechanismIntroduction
To investigate effects of good and bad news on volatility in the BUX return time series as well as predict the high-frequency time series of the BUX stock index, we used quantitative approaches based on time series models which can be derived from the linear filter model. Box and Jenkins [4] integrated all the knowledge about autoregressive and moving average (ARIMA) models. From that time, ARIMA models have been very popular in time series modelling for long time. tof non-linearity in the financial data. Financial markets behave like complex, random, or chaotic non-linear systems. In this context, asymmetric GARCH models introduced by Bollerslev [3] arose as an appropriate framework for studying these problems.
Artificial neural networks (ANNs) which are the mathematical models inspired by biological neural system are regarded as universal approximators. They are able to perform tasks like pattern recognition, classification, or predictions [1, 8, 11]. They also have the biggest potential in predicting time series, which are applied very often in financial risk management. The big potential in applying ANN in finance was also confirmed by Hill et al. [13], where the authors showed that ANNs work best in connection with high-frequency financial data. According to Orr [22], Marcek [18], the most used model of regression neural network type is the RBF neural network. A very important concept, which was put forward in scientific field, is Fuzzy cognitive maps (FCM). According to Magalhães et al. [17], FCMs are originated from the theories of neural networks, fuzzy logic, and evolutionary computing. Miao [20] FCM extended to dynamic cognitive network (DCN), taking in the account dynamic nature of the cause with uncertainties. Using historical data, these tools are capable to identify a pattern assuming that the identified pattern will continue into the future.
The first objective of this paper is to investigate the effect of good and bad news on volatility in the Hungary stock market. The second objective is to find any functional relation in the behaviour of the BUX time series and, in turn, forecast future vales of the series. Firstly, the paper investigates volatility in the BUX return time series using specified non-linear asymmetric models EGARCH, PGARCH which allow identify the presence of leverage effects. Then, the paper proposes three forecasting modelling approaches [statistical, neural networks and support vector machines (SVM)] that generate forecasts on the BUX stock time series. The aim is to examine whether potentially non-linear neural networks outperform latest statistical methods or generate results which are at least comparable with those of statistical models.
This paper is organized in following manner. “Theoretical Background section” deals with the theoretical background of ARMA/GARCH-family models, multilayer feed-forward networks and some basic concepts of SVM theory are refreshed for this paper. “Data and volatility modelling” section examines asymmetric response of volatility to return in the Hungary stock market using asymmetric ARCH-types models. In “Building a prediction model for BUX stock time series and results” section, the resulting statistical models with heteroscedastic noise, the neural networks, and SVMs are applied on 1-day prediction of the BUX stocks index. Here, a novel RBF neural network model is introduced based on error-correction concept. The empirical analysis and findings are presented in “Empirical comparison and discussion” section. “Conclusion” section concludes this paper.
Theoretical background
ARIMA and asymmetric GARCH-type models
Note if \(\varepsilon _{t-i} \) is positive or there is “good news”, the total effect of \(\varepsilon _{t-i} \) is \(\left( {1+ \gamma _i } \right) \varepsilon _{t-i} \). However, contrary to the “good news”, i.e., if \(\varepsilon _{t-i} \) is negative or there is “bad news”, the total effect of \(\varepsilon _{t-i} \) is \(\left( {1-\gamma _i } \right) \left| {\varepsilon _{t-i} } \right| \). Bad news can have a large impact on the volatility. As stated in the work by Zivot and Wang [24], the value of \(\gamma _i \) would be expected to be negative.
Neural networks
Neural networks can be understood as systems which produce output based on inputs the user has defined. It is important to say that user has no knowledge about internal working of the system of ANN. Examples are brought forward the network and then the network tries to get as close as possible to the given output by adapting its parameters (weights). Neural network model has a large number of internal variables which are supposed to set up well to optimize the outputs.
Mathematical model of the neuron is constructed on the basis of functional neuron as a central element of human nervous system, whose task is to transform information from one neuron to the others. The goal of mathematical neuron is the process identification. In other words, we try to find input–output functions, so that the output would have desired parameters and the predicted error would be minimal.
Let \(F: x_t \in R^{k}\rightarrow y_t \in R^{1}\) is a projection assigning k-dimensional vector of inputs, \(x_t^T =(x_{1t} ,x_{2t},\ldots ,x_{kt} )\) in one-dimensional output \(y_{t}\) in specific time t.
According to Anderson [1], the most common used neural network for prediction tasks is the feed-forward network that has at least one hidden layer of neurons. In this network, each layer comprises neurons that receive weighted inputs from the preceding layer and send weighted outputs to the neurons in the succeeding layer. There is no feedback connections. The most known representatives of feed-forward networks are perceptrons and their modified version called RBF neural network (RBFNN) [10].
The activation function of the output neuron is also different; the output neuron is always activated by the linear function \(y~=~u\), where u is the potential of the output neuron calculated as the scalar product of the weight vector v and the output vector o.
Step 1. Randomly initialize the centres of RBF neurons \(c_j^{(t)} , j~=~1, 2, {\ldots },s\) where s represents the number of chosen RFB neurons (clusters). |
Step 2. Apply the new training vector \(x^{\left( t \right) }=\left( {x_1 , x_2 , \ldots ,x_k } \right) \). |
Step 3. Find the nearest centre to \(x^{\left( t \right) }\) and replace its position as follows: \(c_j^{\left( {t+1} \right) } = c_j^{\left( t \right) } +\lambda \left( t \right) (x^{\left( t \right) }-c_j^{\left( t \right) } )\), where \(\lambda \left( t \right) \) is the learning coefficient selected as a linearly decreasing function of t by \(\lambda \left( t \right) =\lambda _0 \left( t \right) \left( {1-\frac{t}{N}} \right) \) where \(\lambda _0 \left( t \right) \) is the initial value, t is the presented learning cycle, and N is the number of learning cycle. |
Step 4. After chosen epochs number, terminate learning. Otherwise go to Step 2. |
Novel RBF neural network model based on error-correction mechanism
Support vector regression model
Despite the fact that RBF neural networks possess a number of attractive properties such as universal approximation ability and parallel structure, they still suffer from problems like the existence of many local minima and the fact that it is unclear how one should choose the number of hidden units. The SVM method is comparatively new learning system that is used in both forecasting and classification problems. This machine uses the hypothesis space of the linear functions in a high-dimensional feature space, and it is trained with a learning algorithm based on optimization theory. The SVM has been recently, introduced by Vapnik [23]. Support vector regression (SVR) is an extension of the support vector machine algorithm for numeric prediction. Its decision boundary can create complex non-linear decision boundaries while reducing the computational complexity.
Data and volatility modelling
Our goal is to examine and compare the behaviour of asymmetric response of equity volatility on return shocks in the Hungary stock market before, during and after financial crisis for the period January 7, 2004 to December 21, 2012, which provided of 2256 daily observations. We have 9 year long time series of the closing prices of BUX stocks data. To access the BUX time series data, see [5]. The return \(r_t \) at time t is defined in the logarithm of BUX indices \(y_{t}\) values that is, \(r_t =\log \left( {y_t } \right) -\hbox {log}\left( {y_{t-1} } \right) \). Time series of daily BUX values which is depicted in Fig. 3 on the left exhibits non-stationary behaviour. However, after its first differencing becomes stationary. As can be seen from Fig. 3 on the right, returns fluctuate around mean value that is closed to zero and also shows volatility clustering where large returns tend to be followed by small returns. Since the volatility is highest in 2008 when the values of BUX stocks reached the minimum in investigated period, we divided the basic period into two periods. First period (as the training data set) was defined from January 2004 to the end of June 2007, i.e. the time before the global financial crisis, and the second one so-called crisis and post-crisis period (validation data set or ex post period) started at the beginning of July 2007 and finished by the end of 2012.
Descriptive statistics of BUX returns (2004–2012)
Period: 1.2004 – 12.2012 | |
---|---|
Mean | 0.000214 |
SD | 0.007343 |
Skewness | \(-\)0.101284 |
Kurtosis | 9.528772 |
Jacques-Bare test | 4174.147 |
Estimated volatility models BUX returns
Coefficient | Value | SD | p value |
---|---|---|---|
Period: 1.2004–6.2007 | |||
PGARCH(1, 1) | |||
d | 1.9E−08 | 1.21E−07 | 0.8717 |
\(\alpha _0 \) | 0.03902 | 0.032215 | 0.2257 |
\(\alpha _1 \) | \(-\)0.0114 | 0.084162 | 0.8918 |
\(\gamma \) | 0.89404 | 0.039859 | 0.0000 |
\(\beta \) | 2.81050 | 1.201266 | 0.0193 |
EGARCH(1,1) | |||
\(\alpha \) | \(-\)0.4520 | 0.262047 | 0.0845 |
\(\alpha _1 \) | 0.10597 | 0.039903 | 0.0079 |
\(\gamma _1 \) | \(-\)0.0093 | 0.017803 | 0.6006 |
\(\beta _1 \) | 0.96474 | 0.023303 | 0.0000 |
Period: 7.2007–12.2012 | |||
PGARCH(1, 1) | |||
d | 2.9E−06 | 5.69E−06 | 0.6063 |
\(\alpha _0 \) | 0.09100 | 0.017187 | 0.0000 |
\(\alpha _1 \) | 0.30564 | 0.082398 | 0.0002 |
\(\gamma \) | 0.89006 | 0.017038 | 0.0000 |
\(\beta \) | 1.81247 | 0.385426 | 0.0000 |
EGARCH(1,1) | |||
\(\alpha \) | \(-\)0.3421 | 0.06805 | 0.0000 |
\(\alpha _1 \) | 0.18012 | 0.02502 | 0.0000 |
\(\gamma _1 \) | \(-\)0.06690 | 0.01409 | 0.3031 |
\(\beta _1 \) | 0.97965 | 0.00594 | 0.0000 |
In many cases, the basic GARCH-family models (2), (3), (4) which are modelled with normal Gaussian error distribution provides a reasonably good model for analysing financial time series and estimating conditional volatility. However, there are some aspects of the model which can be improved, so that it can better capture the characteristics and dynamics. Furthermore, we re-estimated asymmetric GARCH models after having eliminated the restrictive assumption that their error terms follow a normal distribution. Table 3 presents Akaike Information Criteria (AIC) and Log-Likelihood functions (LL) based on assumptions that residuals follow successively a Student’s distribution or Generalized Errors Distribution (GED). From Table 3, it is shown that the assumption of Student’s errors is the best.
Building a prediction model for BUX stock time series and results
Information criteria and log-likelihood function for re-estimated volatility models
Criteria model | PGARCH, Eq. (3) | EGARCH, Eq. (4) | Distribution |
---|---|---|---|
Period: 1.2004–6.2007 | |||
AIC | \(-\)7.632126 | \(-\)7.624700 | Normal |
LL | 3444.273 | 3439.928 | |
AIC | \(-\)7.639619 | \(-\)7.636049 | Student’s |
LL | 3448.648 | 3446.040 | |
AIC | \(-\)7.638942 | \(-\)7.633344 | GED |
LL | 3448.344 | 3444.822 | |
Period: 7.2007–12.2012 | |||
AIC | \(-\)7.118371 | \(-\)7.112786 | Normal |
LL | 6601.171 | 6594.997 | |
AIC | \(-\)7.131425 | \(-\)7.128624 | Student’s |
LL | 6614.265 | 6610.670 | |
AIC | \(-\)7.129529 | \(-\)7.126068 | GED |
LL | 6612.509 | 6608.302 |
Statistical approach
As we mentioned early, high-frequency financial data, like our BUX stock time series, reflect a stylized fact of changing variance over time. An appropriate model that would account for conditional heteroscedasticity should be able to remove possible non-linear pattern in the data. Various procedures are available to test an existence of ARCH-type model. A commonly used test is the LM (Lagrange Multiplier) test [7]. The LM test assumes the null hypothesis \(H_0 : \alpha _1 =\alpha _2 =\cdots =\alpha _p =0\) that there is no ARCH. The LM statistics has an asymptotic \(\chi ^{2}\) distribution with p degrees of freedom under the null hypothesis. The ARCH-LM test up to 10 lags was statistically significant of the mean equation (21).
Estimated mean Eq. (21) for BUX stock time series
Coeff. | Value | SD | p value | D-W |
---|---|---|---|---|
\(\xi \) | 3.938362 | 5.899953 | 0.5045 | 1.994470 |
\(\phi _1 \) | 0.047107 | 0.019045 | 0.0134 |
Information criteria and log-likelihood functions for re-estimated asymmetric variance models (see text for details)
Criteria model | GARCH | PGARCH | EGARCH | Distribution |
---|---|---|---|---|
Period: 1.2004 – 6.2007 | ||||
AIC | \(-\)12.87944 | \(-\)12.48636 | \(-\)12.88632 | Student’s |
LL | 5777.869 | 5611.864 | 5779.957 | |
AIC | \(-\)12.88034 | \(-\)12.48725 | \(-\)12.88750 | GED |
LL | 5778.271 | 5612.262 | 5780.490 |
Re-estimated mean Eq. (21) for BUX stock values, assuming that the random component follow EGARCH(1,1) GED process
Coeff. | Value | SD | p value | D-W |
---|---|---|---|---|
\(\xi \) | 11.6961 | 3.71671 | 3.1469 | 1.93590 |
\(\phi _1 \) | \(-\)0.0157 | 0.03488 | \(-\)0.4493 |
Neuronal and SVM approach
The GRBFNN according to the architecture depicted in Fig. 2 was trained using the variables and data sets as the statistical ARIMA(1,1,0)/EGARCH(1,1) model above. In the GRBFNN, the non-linear forecasting function \(f\left( \mathbf{x} \right) \) was estimated according to the expression (10) with radial basic function \(\psi _2 \left( {./.} \right) \) given by Eq. (13). We used own application of the feed-forward neural network of RBF type with one hidden layer. For standard neural network, we tested three-to-ten hidden neurons to achieve the best results of the network. For every model, only the result with the best configuration was stated. We used linear function as an activation function for the output layer too. The weights of investigated networks were initiated randomly-generated from the uniform distribution \({<}0, 1)\). The learning rate of back-propagation algorithm was set to 0.005 to avoid the easy imprisonment in the local minimum. Final results were taken from the best of 5000 epochs and not from the last epoch to avoid overfitting of the neural network.
The prediction of BUX stock values for the ex post period was also done by SVR model using software developed by Gunn [9] which is the implementation of Vapnik’s Super Vector Machine for the problem of pattern recognition, regression, and ranking function [23]. To achieve high testing accuracy, a suitable kernel function, its parameters, and the regularization parameter C should be properly selected. Hua and Sun [12] have proved that the Gaussian kernel provide superior performance in the generalization ability and convergence speed. To set the standard deviation \(\sigma \) of Gaussian kernel function, and the magnitude of insensitivity zone \(\varepsilon \), we examined various combination of \(\varepsilon ,\sigma \) and searched which combination provides the lowest prediction error. The regularization parameter C which control the trade-off between complexity and misclassified training example was set to \(10^{5}\). The best parameter settings \(\varepsilon , \sigma \) on estimated SVR function (20) are 0.2, 0.52 respectively.
Empirical comparison and discussion
Table 7 presents two statistical measures of model’s forecast accuracy based on the Mean Absolute Percentage Error (MAPE) and the root mean square error (RMSE) calculated over the validation data set and shows the results of the methods used for comparison. The best performing method is GRBFNN with error-correction mechanism followed SVR and soft GRBFNN. A comparison between latest statistical and intelligent methods shows that intelligent prediction methods outperformed the latest statistical forecasting method. Practical advantage of the error-correction mechanism is that the extent of adjustment in a given period to deviations from long-run equilibrium is given by the estimated input–output equation directly from neural network without any further calculation. Further, from Table 7 it is shown that all forecasting models used are very accurate. The development of the error rates on validation data set showed a high inherent deterministic relationship of the underlying variables. Though promising results were achieved with all approaches, for the chaotic financial markets a purely linear (statistical) approach for modelling relationships does not reflect the reality. For example if investors do not react to a small change in the BUX stock values at the first instance, but after crossing certain interval of threshold react all the more, then a non-linear relationship between \(\Delta y_t \) and \(\Delta y_{t-1} \) exists in model (20).
The statistical summary measures of model’s forecast accuracy
Model | RMSE | MAPE |
---|---|---|
ARIMA (1,1,0)/EGARCH(1,1) GED | 463.93 | 1.650 |
Soft GRBFNN | 407.17 | 1.453 |
GRBFNN with error-correction mechanism | 295.56 | 1.197 |
SVR | 392.3 | 1.421 |
Conclusion
We investigated the volatility of the BUX stock return using two non-linear asymmetric models: PGARCH (1,1) and EGARCH(1,1). We found that the BUX stock return series exhibits leverage effects. In addition to leverage effect, it exhibits other stylized facts such as volatility clustering and leptokurtosis associated with return on developed market. In case of the BUX returns we found that PGARCH(1,1) model with Student’s errors can be appropriate representative of the asymmetric conditional volatility process in both pre-crisis and crisis or post-crisis periods.
In case of construction forecasting models for BUX stock time series, we proposed four approaches. The first one was based on latest statistical ARMA/GARCH methodologies, the second one on soft GRBFNN, the third on novel GRBFNN based on incorporation of an error-correction mechanism. The fourth approach was forecasting model based on SVR method, which is comparatively new learning method that uses hypothesis space of linear function in a high-dimensional feature space and it is trained with a learning algorithm based on the optimization theory.
After performed experiments it was established that forecasting model based on SVR is better than ARMA/GARCH one to predict high-frequency financial data for the BUX stock time series. The direct comparison of forecast accuracies between the statistical ARMA/GARCH forecasting model and its neural representation shows that both investigated methodologies yield very little MAPE values. Moreover, our experiments show that neural forecasting systems are economical and computational very efficient, well suited for high-frequency data forecasting. In future also more ways of combining prediction techniques will also be tested to see if hybrid network architectures are better than single one.
Notes
Acknowledgements
This work was supported within Operational Programme Education for Competitiveness—Project No. CZ.1.07./2.3.00./20.0296.
References
- 1.Anderson JA, Rosenfeld E (1988) A collection of articles summarizing the state-of-the-art as of 1988. Neurocomputing: foundations of research. MIT Press, CambridgeGoogle Scholar
- 2.Banerjee AJ, Dolado J, Galbrait JW, Hendry DF (1999) Co-integration, error correction, and the econometric analysis of non-stationary data. Oxford University Press, Oxford. http://www.oxfordscholarship.com/view/10.1093/0198288107.001.0001/acprof-9780198288107
- 3.Bollerslev D (1986) Generalized autoregressive conditional heteroscedasticity. J Econ 31:307–327. http://www.sciencedirect.com/science/article/pii/0304407686900631
- 4.Box GEP, Jenkins GM (1970) Time series analysis, forecasting and control. Holden-Day, San Francisco. http://garfield.library.upenn.edu/classics1989/A1989AV48500001.pdf
- 5.BUX time series data. http://www.bse.hu. Accessed 21 Sep 2014
- 6.Ding Z, Granger CW, Engle RF (1993) A long memory property of stock market returns and a new model. J Empir Finance 1: 83–106. http://www.sciencedirect.com/science/article/pii/0927-5398(93)90006-D
- 7.Engle RF (1982) Autoregressive conditional heteroscedasticity with estimates of the variance of United Kingdom inflation. Econometrica 50(4): 987–1007. http://links.jstor.org/sici?sici=0012-9682%2819820...O%3B2-3&origin=repec
- 8.Gooijer JG, Hyndman RJ (2006) 25 years of time series forecasting. Int J Forecast 22:443-473. http://www.robjhyndman.com/papers/ijf25.pdf
- 9.Gunn SR (1997) Support vector machines for classification and regression. Technical report, image speech and intelligent systems research group. University of Southampton, Southampton. http://www.isis.ecs.soton.ac.uk/isystems/kernel
- 10.Hecht-Nielsen R (1990) Neuro-computing. Addison-Wesley, ReadingGoogle Scholar
- 11.Hertz H, Krogh A, Palmer RG (1991) Introduction to the theory of neural computation. Westview Press, USAGoogle Scholar
- 12.Hill TL, Marquez M, O’Connor, Remus W (1994) Neural networks models for forecasting and decision making. Int J Forecast 10. http://www.sciencedirect.com/science/article/pii/0169207094900450
- 13.Hua S, Sun Z (2001) A method of protein secondary structure prediction with high segment overlap measure: support vector machine approach. J Mol Biol 308:397–407. https://www.ncbi.nlm.nih.gov/pubmed/11327775
- 14.Kecman V (2001) Learning and soft computing—support vector machines, neural networks and fuzzy logic models. Massachusetts Institute of Technology. http://dl.acm.org/citation.cfm?id=559026&preflayout=flat
- 15.Kohonen T (2012) Self-organization and associative memory. Springer, Berlin. http://scholar.google.cz/scholar?q=Kohonen+T+2012+Self-Organization+and+Associative+Memory.+Berlin:+Springer-Verlag&hl=cs&as_sdt=0&as_vis=1&oi=scholart&sa=X&ved=0ahUKEwjh0rzL4c_PAhUMCcAKHZViDw8QgQMIGTAA
- 16.Li D, Du Y (2008) Artificial intelligence with uncertainty. Chapman & Hall/CRC, Taylor & Francis Group, Boca Raton. http://dlia.ir/Scientific/e_book/Science/Cybernetics/006280.pdf
- 17.Magalhães MH, Ballini R, Gomide FAC (2008) Granular models for time-series forecasting. In: Pedrycz W, Skowron A, Kreinovich V (eds) Handbook of granular computing. Wiley, Chichester, pp 949–967Google Scholar
- 18.Marcek D (2004) Some intelligent approaches to stock price modelling and forecasting. J Inf Control Manag Syst 2(1). http://jicms.fri.uniza.sk/index.php/jicms/article/view/742
- 19.Marcek M, Marcek D (2008) Granular RBF neural network implementation of fuzzy systems: application to time series modeling. J Mult.-Valued Logic Soft Comput 14(3–5):401–414. http://dblp2.uni-trier.de/db/journals/mvl/mvl14
- 20.Miao Y, Liu Z, Siew C, Miao C (2001) Dynamical cognitive network an extension of fuzzy cognitive map. IEEE Trans Fuzzy Syst 9(5):760–770. http://dl.acm.org/citation.cfm?id=2234978
- 21.Nelson DB (1991) Conditional heteroskedasticity in asset returns: a new approach. Econometrica 59(2): 347–370. http://scholar.google.sk/scholar?q=Nelson+DB+1991+Conditional+Heteroskedasticity+in+Asset+Returns:+A+New+Approach,+Econometrica+59+(2),+pp+347-370&hl=sk&as_sdt=0&as_vis=1&oi=scholart&sa=X&ved=0ahUKEwjx48mF08bPAhUDSBQKHekaCb8QgQMIGDAA
- 22.Orr MJL (1986) Introduction to radial basis function networks. http://www.anc.ed.ac.uk/rbf/intro/intro.html
- 23.Vapnik V (1990) Statistical learning theory. Wiley, New-York. http://eu.wiley.com/WileyCDA/WileyTitle/productCd-0471030031.html
- 24.Zivot E, Wang EF (2005) Modeling financial time series with S-PLUS. Springer, New York. http://www.springer.com/la/book/9780387279657
Copyright information
Open AccessThis article is distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license, and indicate if changes were made.