Abstract
Predicting stock index and its movement has never been lack of attention among traders and professional analysts, because of the attractive financial gains. For the last two decades, extensive researches combined technical indicators with machine learning techniques to construct effective prediction models. This study is to investigate the impact of various data normalization methods on using support vector machine (SVM) and technical indicators to predict the price movement of stock index. The experimental results suggested that, the prediction system based on SVM and technical indicators, should carefully choose an appropriate data normalization method so as to avoid its negative influence on prediction accuracy and the processing time on training.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
References
Jacinta, C.: Financial Times Guide to Technical Analysis: How to Trade like a Professional. Pearson, UK (2012)
Atsalakis, G.S., Valavanis, K.P.: Surveying stock market forecasting techniques–Part II: soft computing methods. Expert Syst. Appl. 36(3), 5932–5941 (2009)
Chen, W.-H., Shih, J.-Y., Wu, S.: Comparison of support-vector machines and back propagation neural networks in forecasting the six major Asian stock markets. Int. J. Electron. Finan. 1(1), 49–67 (2006)
Kara, Y., Boyacioglu, M.A., Baykan, Ö.K.: Predicting direction of stock price index movement using artificial neural networks and support vector machines: the sample of the Istanbul Stock Exchange. Expert Syst. Appl. 38(5), 5311–5319 (2011)
Vladimir, V.N., Vapnik, V.: The nature of statistical learning theory (1995)
Kotsiantis, S.B., Kanellopoulos, D., Pintelas, P.E.: Data preprocessing for supervised leaning. Int. J. Comput. Sci. 1(2), 111–117 (2006)
Malkiel, B.G., Fama, E.F.: Efficient capital markets: a review of theory and empirical work. J. Finan. 25(2), 383–417 (1970)
Malkiel, B.G.: A random walk down Wall Street: the time-tested strategy for successful investing. WW Norton & Company, New York (2007)
Kim, K.-J.: Financial time series forecasting using support vector machines. Neurocomputing 55(1), 307–319 (2003)
Kumar, M., Thenmozhi, M.: Forecasting stock index movement: a comparison of support vector machines and random forest. In: Indian Institute of Capital Markets 9th Capital Markets Conference Paper (2006)
Huang, W., Nakamori, Y., Wang, S.-Y.: Forecasting stock market movement direction with support vector machine. Comput. Oper. Res. 32(10), 2513–2522 (2005)
Al Shalabi, L., Shaaban, Z.: Normalization as a preprocessing engine for data mining and the approach of preference matrix. In: 2006 International Conference on Dependability of Computer Systems, IEEE (2006)
Mustaffa, Z., Yusof, Y.: A comparison of normalization techniques in predicting dengue outbreak. In: International Conference on Business and Economics Research, vol. 1, IACSIT Press (2011)
Jayalakshmi, T., Santhakumaran, A.: Statistical normalization and back propagation for classification. Int. J. Comput. Theor. Eng. 3(1), 89 (2011)
Nayak, S.C., Misra, B.B., Behera, H.S.: Impact of data normalization on stock index forecasting. Int. J. Comp. Inf. Syst. Ind. Manag. Appl. 6, 357–369 (2014)
Singh, BK., Verma, K., Thoke, A.S.: Investigations on impact of feature normalization techniques on classifier’s performance in breast tumor classification. Int. J. Comput. Appl. 116(19) (2015)
Han, J., Pei, J., Kamber, M.: Data Mining: Concepts And Techniques. Elsevier, Waltham (2011)
Achelis, S.B.: Technical Analysis from A to Z. McGraw Hill, New York (2001)
Guyon, I., Elisseeff, A.: An introduction to variable and feature selection. J. Mach. Learn. Res. 3, 1157–1182 (2003)
Furey, T.S., et al.: Support vector machine classification and validation of cancer tissue samples using microarray expression data. Bioinformatics 16(10), 906–914 (2000)
Hsu, C-W., Chang, C-C., Lin, C-J.: A practical guide to support vector classification, pp. 1–16 (2003)
Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann, P., Witten, I.H.: The WEKA data mining software: an update. ACM SIGKDD Explor. Newsl. 11(1), 10–18 (2009)
Chang, C-C., Lin C.J.: LIBSVM. a library for support vector machines (2012). http://www.csie.ntu.edu.tw/cjlin/libsvm
Acknowledgement
The authors of this paper would like to thank Research and Development Administrative Office of the University of Macau, for the funding support of this project which is called “Building Sustainable Knowledge Networks through Online Communities” with the project code MYRG2015-00024-FST.
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Appendix A. The Descriptions and Definition of Input Features
Appendix A. The Descriptions and Definition of Input Features
-
1.
OSCP_SMA: the difference between 5-day and 10-day simple moving average
$$ OSCP\_SMA_{n,m} \left( t \right) = SMA_{n} \left( {C_{t} } \right) - SMA_{m} \left( {C_{t} } \right) $$(1)
Where n and m are the length of time period, \( SMA_{n} \left( {C_{t} } \right) = \left( {\mathop \sum \limits_{i = 1}^{n} C_{t - i + 1} } \right)/n \), C t is the closing price at time t.
-
2.
OSCP_WMA: the difference between 5-day and 10-day weighted moving average
$$ OSCP\_WMA_{n,m} \left( t \right) = WMA_{n} \left( {C_{t} } \right) - WMA_{m} \left( {C_{t} } \right) $$(2)
Where \( WMA_{n} \left( {C_{t} } \right) = \left( {\mathop \sum \limits_{i = 1}^{n} ((n - i + 1) \cdot C_{t - i + 1} )} \right)/\mathop \sum \limits_{i = 1}^{n} i \).
-
3.
OSCP_EMA: the difference between 5-day and 10-day exponential moving average
$$ OSCP\_EMA_{n,m} \left( t \right) = EMA_{n} \left( {C_{t} } \right) - EMA_{m} \left( {C_{t} } \right) $$(3)
Where \( EMA_{n} \left( {C_{t} } \right) = \left( {1 - \alpha } \right) \cdot EMA_{n} \left( {C_{t - 1} } \right) + \alpha \cdot C_{t} \), and \( \alpha = 2/(n + 1) \).
-
4.
BIAS_5: 5-day Bias Ratio. It measures the divergence of current price from the moving averages over an observation period.
$$ BIAS_{n} (t) = 100 \times \frac{{C_{t} - SMA_{n} \left( {C_{t} } \right)}}{{SMA_{n} \left( {C_{t} } \right)}} $$(4) -
5.
CCI_20: 20-day Commodity Channel Index. It measures the variation of a security’s price from its statistical mean.
$$ CCI_{n} (t) = \frac{{TYP_{t} - SMA_{n} (TYP_{t} )}}{{0.015 \times MAD_{n} (TYP_{t} )}} $$(5)
Where, \( MAD_{n} \left( {TYP_{t} } \right) = (\mathop \sum \limits_{i = 1}^{n} \left| {TYP_{t - i + 1} - SMA_{n} (TYP_{t} )} \right|)/n \), \( TYP_{t} = (H_{t} + L_{t} + C_{t} )/3 \), \( H_{t} \) is the high price at time t and \( L_{t} \) is the low price at time t.
-
6.
PDI_14: 14-day Positive Directional Index. It summarizes upward trend movement.
$$ PDI_{n} (t) = 100 \times \frac{{MMA_{n} \left( {PDM_{t} } \right)}}{{MMA_{n} \left( {TR_{t} } \right)}} $$(6)
Where
-
7.
NDI_14: 14-day Negative Directional Index. It summarizes downward trend movement.
$$ NDI_{n} (t) = 100 \times \frac{{MMA_{n} \left( {NDM_{t} } \right)}}{{MMA_{n} \left( {TR_{t} } \right)}} $$(7)
Where
-
8.
DMI_14: 14-day Directional Moving Index. It measures the difference between positive and negative directional index.
$$ DMI_{n} (t) = PDI_{n} (t) - NDI_{n} (t) $$(8) -
9.
slow%K_3: 3-day Stochastic Slow Index K. It’s a simple moving average of 10-day Stochastic Index K, which showed the relationship of the differences of today’s closing price with the period lowest price and the trading range
$$ slow\% K_{n} \left( t \right) = SMA_{n} \left( {\% K_{m} \left( t \right)} \right) $$(9)
Where \( \% K_{m} \left( t \right) = 100 \times (C_{t} - LV_{m} )/(HV_{m} - LV_{m} ) \), \( LV_{m} \) is the lowest low price (in previous m days period), \( HV_{m} \) is the highest high price.
-
10.
%D_3: 3-day Stochastic Index D. It’s a simple moving average of Stochastic Slow Index K.
$$ \% D_{n} \left( t \right) = SMA_{n} \left( {slow\% K_{n} \left( t \right)} \right) $$(10) -
11.
Stochastic_DIFF: the difference between Stochastic Slow Index K and Stochastic Index D.
$$ Stochastic\_DIFF_{n} \left( t \right) = slow\% K_{n} \left( t \right) - \% D_{n} \left( t \right) $$(11) -
12.
PSY_10: 10-day Psychological Line.
$$ PSY_{n} \left( t \right) = 100 \times \left( {\frac{{UD_{n} }}{n}} \right) $$(12)
Where \( UD_{n} \) is the number of closing price upward days during previous n days.
-
13.
MTM_10: 10-day Momentum. It measures the pace at which a trend is accelerating or decelerating.
$$ MTM_{n} \left( t \right) = C_{t} - C_{t - n + 1} $$(13) -
14.
WMTM_9: 9-day simple moving average of MTM_10.
$$ WMTM_{m} \left( t \right) = SMA_{m} \left( {MTM_{n} \left( t \right)} \right) $$(14) -
15.
MTM_DIFF: the difference between MTM_10 and WMTM_9.
$$ WMTM_{n,m} \left( t \right) = MTM_{n} \left( t \right) - WMTM_{m} \left( t \right) $$(15) -
16.
ROC_12: 12-day Price Rate of Change. It measures the percent change of the current price and the price a specified period ago.
$$ ROC_{n} \left( t \right) = 100 \times (C_{t} - C_{t - n + 1} )/C_{t - n + 1} $$(16) -
17.
RSI_14: 14-day Relative Strength Index. It reflects the price strength by comparing upward and downward close-to-close movements over a predetermined period.
$$ RSI_{n} (t) = 100 - \frac{100}{{1 + RS_{n} (t)}} $$(17)
Where
-
18.
W%R_14: 14-day William’s Oscillator Percent R. It helps highlight the overbought/oversold levels by measuring the latest closing price in relation to the highest price and lowest price of the price range over an observation period
$$ W\% R_{n} \left( t \right) = - 100 \times \frac{{HV_{n} - C_{t} }}{{HV_{n} - LV_{n} }} $$(18) -
19.
CMF_21: 21-day Chaikin Money Flow. It compares the summation of volume to the closing price and the daily peaks and falls, so that determine whether money is flowing in or out during a definite period.
$$ CMF_{n} (t) = 100 \times \frac{{\mathop \sum \nolimits_{i = 1}^{n} MFV_{t - i + 1} }}{{\mathop \sum \nolimits_{i = 1}^{n} V_{t - i + 1} }} $$(19)
Where \( V_{t} \) is the trading volume at time t, and \( MFV_{t} = V_{t} \times \left( {2 \times C_{t} - L_{t} } \right. - \left. {H_{t} } \right)/(H_{t} - L_{t} ) \).
-
20.
MFI_14: 14-day Money Flow Index. It measures the strength of money flowing in and out of a security.
$$ MFI_{n} (t) = 100 - \frac{100}{{1 + MR_{t} }} $$(20)
Where
-
21.
RVI_14: 14-day Relative Volatility Index. It was used to indicate the direction of volatility, and it’s similar to the RSI, except that it measures the 10-day standard deviation (STD) of prices changes over a period rather than the absolute price changes.
$$ RVI_{n, m} (t) = 100 \times \frac{{SMA_{m} \left( {u_{n} (t)} \right)}}{{SMA_{m} \left( {u_{n} (t)} \right) + SMA_{m} \left( {d_{n} (t)} \right)}} $$(21)
Where
-
22.
AROON_25: 25-day Aroon Oscillator. It can help determine whether a security is trending or not and how strong the trend is.
$$ Aroon_{n} \left( t \right) = 100 \times \left( {HD_{n} \left( t \right) - LD_{n} \left( t \right)} \right)/n $$(22)
Where \( HD_{n} \left( t \right) \) is the number of days since the highest high price during the n observation days, and \( LD_{n} \left( t \right) \) is the number of days since the lowest low price during the n observation days.
-
23.
EMV: Ease of Movement. It demonstrates how much volume is required to move prices.
$$ EMV_{t} = 1000000 \times (H_{t} - L_{t} )\frac{{(H_{t} - H_{t - 1} ) + (L_{t} - L_{t - 1} )}}{{2 \times V_{t} }} $$(23) -
24.
WEMV_14: 14-day simple moving average of Ease of Movement.
$$ WEMV_{n} \left( t \right) = SMA_{n} \left( {EMV_{t} } \right), $$(24) -
25.
MACD: Moving Average Convergence and Divergence. It’s represented as the difference between 12-day and 26-day exponential moving average of closing price.
$$ MACD_{n1, n2} (t) = EMA_{n1} \left( {C_{t} } \right) - EMA_{n2} \left( {C_{t} } \right) $$(25) -
26.
MACDSIG: 9-day simple moving average of MACD
$$ MACDSIG_{n3} (t) = EMA_{n3} (MACD_{n1, n2} (t)) $$(26) -
27.
MACD_DIFF: the difference between MACD and MACD_SIG
$$ MACD\_DIFF_{t} = MACD_{n1, n2} \left( t \right) - MACDSIG_{n3} (t) $$(27)
Rights and permissions
Copyright information
© 2016 Springer Nature Singapore Pte Ltd.
About this paper
Cite this paper
Pan, J., Zhuang, Y., Fong, S. (2016). The Impact of Data Normalization on Stock Market Prediction: Using SVM and Technical Indicators. In: Berry, M., Hj. Mohamed, A., Yap, B. (eds) Soft Computing in Data Science. SCDS 2016. Communications in Computer and Information Science, vol 652. Springer, Singapore. https://doi.org/10.1007/978-981-10-2777-2_7
Download citation
DOI: https://doi.org/10.1007/978-981-10-2777-2_7
Published:
Publisher Name: Springer, Singapore
Print ISBN: 978-981-10-2776-5
Online ISBN: 978-981-10-2777-2
eBook Packages: Computer ScienceComputer Science (R0)