Skip to main content

The Impact of Data Normalization on Stock Market Prediction: Using SVM and Technical Indicators

  • Conference paper
  • First Online:
Soft Computing in Data Science (SCDS 2016)

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 652))

Included in the following conference series:

Abstract

Predicting stock index and its movement has never been lack of attention among traders and professional analysts, because of the attractive financial gains. For the last two decades, extensive researches combined technical indicators with machine learning techniques to construct effective prediction models. This study is to investigate the impact of various data normalization methods on using support vector machine (SVM) and technical indicators to predict the price movement of stock index. The experimental results suggested that, the prediction system based on SVM and technical indicators, should carefully choose an appropriate data normalization method so as to avoid its negative influence on prediction accuracy and the processing time on training.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 39.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 54.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

References

  1. Jacinta, C.: Financial Times Guide to Technical Analysis: How to Trade like a Professional. Pearson, UK (2012)

    Google Scholar 

  2. Atsalakis, G.S., Valavanis, K.P.: Surveying stock market forecasting techniques–Part II: soft computing methods. Expert Syst. Appl. 36(3), 5932–5941 (2009)

    Article  Google Scholar 

  3. Chen, W.-H., Shih, J.-Y., Wu, S.: Comparison of support-vector machines and back propagation neural networks in forecasting the six major Asian stock markets. Int. J. Electron. Finan. 1(1), 49–67 (2006)

    Article  Google Scholar 

  4. Kara, Y., Boyacioglu, M.A., Baykan, Ö.K.: Predicting direction of stock price index movement using artificial neural networks and support vector machines: the sample of the Istanbul Stock Exchange. Expert Syst. Appl. 38(5), 5311–5319 (2011)

    Article  Google Scholar 

  5. Vladimir, V.N., Vapnik, V.: The nature of statistical learning theory (1995)

    Google Scholar 

  6. Kotsiantis, S.B., Kanellopoulos, D., Pintelas, P.E.: Data preprocessing for supervised leaning. Int. J. Comput. Sci. 1(2), 111–117 (2006)

    Google Scholar 

  7. Malkiel, B.G., Fama, E.F.: Efficient capital markets: a review of theory and empirical work. J. Finan. 25(2), 383–417 (1970)

    Article  Google Scholar 

  8. Malkiel, B.G.: A random walk down Wall Street: the time-tested strategy for successful investing. WW Norton & Company, New York (2007)

    Google Scholar 

  9. Kim, K.-J.: Financial time series forecasting using support vector machines. Neurocomputing 55(1), 307–319 (2003)

    Article  Google Scholar 

  10. Kumar, M., Thenmozhi, M.: Forecasting stock index movement: a comparison of support vector machines and random forest. In: Indian Institute of Capital Markets 9th Capital Markets Conference Paper (2006)

    Google Scholar 

  11. Huang, W., Nakamori, Y., Wang, S.-Y.: Forecasting stock market movement direction with support vector machine. Comput. Oper. Res. 32(10), 2513–2522 (2005)

    Article  MATH  Google Scholar 

  12. Al Shalabi, L., Shaaban, Z.: Normalization as a preprocessing engine for data mining and the approach of preference matrix. In: 2006 International Conference on Dependability of Computer Systems, IEEE (2006)

    Google Scholar 

  13. Mustaffa, Z., Yusof, Y.: A comparison of normalization techniques in predicting dengue outbreak. In: International Conference on Business and Economics Research, vol. 1, IACSIT Press (2011)

    Google Scholar 

  14. Jayalakshmi, T., Santhakumaran, A.: Statistical normalization and back propagation for classification. Int. J. Comput. Theor. Eng. 3(1), 89 (2011)

    Article  Google Scholar 

  15. Nayak, S.C., Misra, B.B., Behera, H.S.: Impact of data normalization on stock index forecasting. Int. J. Comp. Inf. Syst. Ind. Manag. Appl. 6, 357–369 (2014)

    Google Scholar 

  16. Singh, BK., Verma, K., Thoke, A.S.: Investigations on impact of feature normalization techniques on classifier’s performance in breast tumor classification. Int. J. Comput. Appl. 116(19) (2015)

    Google Scholar 

  17. Han, J., Pei, J., Kamber, M.: Data Mining: Concepts And Techniques. Elsevier, Waltham (2011)

    MATH  Google Scholar 

  18. Achelis, S.B.: Technical Analysis from A to Z. McGraw Hill, New York (2001)

    Google Scholar 

  19. Guyon, I., Elisseeff, A.: An introduction to variable and feature selection. J. Mach. Learn. Res. 3, 1157–1182 (2003)

    MATH  Google Scholar 

  20. Furey, T.S., et al.: Support vector machine classification and validation of cancer tissue samples using microarray expression data. Bioinformatics 16(10), 906–914 (2000)

    Article  Google Scholar 

  21. Hsu, C-W., Chang, C-C., Lin, C-J.: A practical guide to support vector classification, pp. 1–16 (2003)

    Google Scholar 

  22. Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann, P., Witten, I.H.: The WEKA data mining software: an update. ACM SIGKDD Explor. Newsl. 11(1), 10–18 (2009)

    Article  Google Scholar 

  23. Chang, C-C., Lin C.J.: LIBSVM. a library for support vector machines (2012). http://www.csie.ntu.edu.tw/cjlin/libsvm

Download references

Acknowledgement

The authors of this paper would like to thank Research and Development Administrative Office of the University of Macau, for the funding support of this project which is called “Building Sustainable Knowledge Networks through Online Communities” with the project code MYRG2015-00024-FST.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Jiaqi Pan .

Editor information

Editors and Affiliations

Appendix A. The Descriptions and Definition of Input Features

Appendix A. The Descriptions and Definition of Input Features

  1. 1.

    OSCP_SMA: the difference between 5-day and 10-day simple moving average

    $$ OSCP\_SMA_{n,m} \left( t \right) = SMA_{n} \left( {C_{t} } \right) - SMA_{m} \left( {C_{t} } \right) $$
    (1)

Where n and m are the length of time period, \( SMA_{n} \left( {C_{t} } \right) = \left( {\mathop \sum \limits_{i = 1}^{n} C_{t - i + 1} } \right)/n \), C t is the closing price at time t.

  1. 2.

    OSCP_WMA: the difference between 5-day and 10-day weighted moving average

    $$ OSCP\_WMA_{n,m} \left( t \right) = WMA_{n} \left( {C_{t} } \right) - WMA_{m} \left( {C_{t} } \right) $$
    (2)

Where \( WMA_{n} \left( {C_{t} } \right) = \left( {\mathop \sum \limits_{i = 1}^{n} ((n - i + 1) \cdot C_{t - i + 1} )} \right)/\mathop \sum \limits_{i = 1}^{n} i \).

  1. 3.

    OSCP_EMA: the difference between 5-day and 10-day exponential moving average

    $$ OSCP\_EMA_{n,m} \left( t \right) = EMA_{n} \left( {C_{t} } \right) - EMA_{m} \left( {C_{t} } \right) $$
    (3)

Where \( EMA_{n} \left( {C_{t} } \right) = \left( {1 - \alpha } \right) \cdot EMA_{n} \left( {C_{t - 1} } \right) + \alpha \cdot C_{t} \), and \( \alpha = 2/(n + 1) \).

  1. 4.

    BIAS_5: 5-day Bias Ratio. It measures the divergence of current price from the moving averages over an observation period.

    $$ BIAS_{n} (t) = 100 \times \frac{{C_{t} - SMA_{n} \left( {C_{t} } \right)}}{{SMA_{n} \left( {C_{t} } \right)}} $$
    (4)
  2. 5.

    CCI_20: 20-day Commodity Channel Index. It measures the variation of a security’s price from its statistical mean.

    $$ CCI_{n} (t) = \frac{{TYP_{t} - SMA_{n} (TYP_{t} )}}{{0.015 \times MAD_{n} (TYP_{t} )}} $$
    (5)

Where, \( MAD_{n} \left( {TYP_{t} } \right) = (\mathop \sum \limits_{i = 1}^{n} \left| {TYP_{t - i + 1} - SMA_{n} (TYP_{t} )} \right|)/n \), \( TYP_{t} = (H_{t} + L_{t} + C_{t} )/3 \), \( H_{t} \) is the high price at time t and \( L_{t} \) is the low price at time t.

  1. 6.

    PDI_14: 14-day Positive Directional Index. It summarizes upward trend movement.

    $$ PDI_{n} (t) = 100 \times \frac{{MMA_{n} \left( {PDM_{t} } \right)}}{{MMA_{n} \left( {TR_{t} } \right)}} $$
    (6)

Where

$$ PDM_{t} = \left\{ {\begin{array}{*{20}c} {H_{t} - H_{t - 1} , if H_{t} - H_{t - 1} > 0 \,and\, H_{t} - H_{t - 1} > L_{t - 1} - L_{t} } \\ {0, Otherwise} \\ \end{array} } \right. $$
$$ TR_{t} = { \hbox{max} }( \left| {H_{t} - L_{t} } \right|, \left| {H_{t} - C_{t - 1} } \right|, \left| {L_{t} - C_{t - 1} } \right|) $$
$$ MMA_{n} \left( {TR_{t} } \right) = \frac{{\left( {n - 1} \right) \cdot MMA_{n} \left( {TR_{t - 1} } \right) + TR_{t} }}{n} $$
  1. 7.

    NDI_14: 14-day Negative Directional Index. It summarizes downward trend movement.

    $$ NDI_{n} (t) = 100 \times \frac{{MMA_{n} \left( {NDM_{t} } \right)}}{{MMA_{n} \left( {TR_{t} } \right)}} $$
    (7)

Where

$$ NDM_{t} = \left\{ {\begin{array}{*{20}c} {L_{t - 1} - L_{t} , if L_{t - 1} - L_{t} > 0\, and\, H_{t} - H_{t - 1} < L_{t - 1} - L_{t} } \\ {0, Otherwise} \\ \end{array} } \right. $$
  1. 8.

    DMI_14: 14-day Directional Moving Index. It measures the difference between positive and negative directional index.

    $$ DMI_{n} (t) = PDI_{n} (t) - NDI_{n} (t) $$
    (8)
  2. 9.

    slow%K_3: 3-day Stochastic Slow Index K. It’s a simple moving average of 10-day Stochastic Index K, which showed the relationship of the differences of today’s closing price with the period lowest price and the trading range

    $$ slow\% K_{n} \left( t \right) = SMA_{n} \left( {\% K_{m} \left( t \right)} \right) $$
    (9)

Where \( \% K_{m} \left( t \right) = 100 \times (C_{t} - LV_{m} )/(HV_{m} - LV_{m} ) \), \( LV_{m} \) is the lowest low price (in previous m days period), \( HV_{m} \) is the highest high price.

  1. 10.

    %D_3: 3-day Stochastic Index D. It’s a simple moving average of Stochastic Slow Index K.

    $$ \% D_{n} \left( t \right) = SMA_{n} \left( {slow\% K_{n} \left( t \right)} \right) $$
    (10)
  2. 11.

    Stochastic_DIFF: the difference between Stochastic Slow Index K and Stochastic Index D.

    $$ Stochastic\_DIFF_{n} \left( t \right) = slow\% K_{n} \left( t \right) - \% D_{n} \left( t \right) $$
    (11)
  3. 12.

    PSY_10: 10-day Psychological Line.

    $$ PSY_{n} \left( t \right) = 100 \times \left( {\frac{{UD_{n} }}{n}} \right) $$
    (12)

Where \( UD_{n} \) is the number of closing price upward days during previous n days.

  1. 13.

    MTM_10: 10-day Momentum. It measures the pace at which a trend is accelerating or decelerating.

    $$ MTM_{n} \left( t \right) = C_{t} - C_{t - n + 1} $$
    (13)
  2. 14.

    WMTM_9: 9-day simple moving average of MTM_10.

    $$ WMTM_{m} \left( t \right) = SMA_{m} \left( {MTM_{n} \left( t \right)} \right) $$
    (14)
  3. 15.

    MTM_DIFF: the difference between MTM_10 and WMTM_9.

    $$ WMTM_{n,m} \left( t \right) = MTM_{n} \left( t \right) - WMTM_{m} \left( t \right) $$
    (15)
  4. 16.

    ROC_12: 12-day Price Rate of Change. It measures the percent change of the current price and the price a specified period ago.

    $$ ROC_{n} \left( t \right) = 100 \times (C_{t} - C_{t - n + 1} )/C_{t - n + 1} $$
    (16)
  5. 17.

    RSI_14: 14-day Relative Strength Index. It reflects the price strength by comparing upward and downward close-to-close movements over a predetermined period.

    $$ RSI_{n} (t) = 100 - \frac{100}{{1 + RS_{n} (t)}} $$
    (17)

Where

$$ RS_{n} (t) = \frac{{SMA_{n} \left( {Close\_Up_{t} } \right)}}{{SMA_{n} \left( {Close\_Down_{t} } \right)}} $$
$$ Close\_Up_{t} = \left\{ {\begin{array}{*{20}c} {C_{t} - C_{t - 1} , \,if\, C_{t} - C_{t - 1} > 0 } \\ {0 , Otherwise} \\ \end{array} } \right. $$
$$ Close\_Down_{t} = \left\{ {\begin{array}{*{20}c} {C_{t - 1} - C_{t} , \, if\, C_{t - 1} - C_{t} > 0 } \\ {0 , Otherwise} \\ \end{array} } \right. $$
  1. 18.

    W%R_14: 14-day William’s Oscillator Percent R. It helps highlight the overbought/oversold levels by measuring the latest closing price in relation to the highest price and lowest price of the price range over an observation period

    $$ W\% R_{n} \left( t \right) = - 100 \times \frac{{HV_{n} - C_{t} }}{{HV_{n} - LV_{n} }} $$
    (18)
  2. 19.

    CMF_21: 21-day Chaikin Money Flow. It compares the summation of volume to the closing price and the daily peaks and falls, so that determine whether money is flowing in or out during a definite period.

    $$ CMF_{n} (t) = 100 \times \frac{{\mathop \sum \nolimits_{i = 1}^{n} MFV_{t - i + 1} }}{{\mathop \sum \nolimits_{i = 1}^{n} V_{t - i + 1} }} $$
    (19)

Where \( V_{t} \) is the trading volume at time t, and \( MFV_{t} = V_{t} \times \left( {2 \times C_{t} - L_{t} } \right. - \left. {H_{t} } \right)/(H_{t} - L_{t} ) \).

  1. 20.

    MFI_14: 14-day Money Flow Index. It measures the strength of money flowing in and out of a security.

    $$ MFI_{n} (t) = 100 - \frac{100}{{1 + MR_{t} }} $$
    (20)

Where

$$ MR_{n} = \frac{{\mathop \sum \nolimits_{i = 1}^{n} + MF_{t - i + 1} }}{{\mathop \sum \nolimits_{i = 1}^{n} - MF_{t - i + 1} }} $$
$$ + MF_{t} = \left\{ {\begin{array}{*{20}c} {TPY_{t} \times V_{t} , \,if\, TYP_{t} > TYP_{t - 1} } \\ {0 , Otherwise} \\ \end{array} } \right. $$
$$ - MF_{t} = \left\{ {\begin{array}{*{20}c} {TPY_{t} \times V_{t} , \,if\, TYP_{t} < TYP_{t - 1} } \\ {0 , Otherwise} \\ \end{array} } \right. $$
  1. 21.

    RVI_14: 14-day Relative Volatility Index. It was used to indicate the direction of volatility, and it’s similar to the RSI, except that it measures the 10-day standard deviation (STD) of prices changes over a period rather than the absolute price changes.

    $$ RVI_{n, m} (t) = 100 \times \frac{{SMA_{m} \left( {u_{n} (t)} \right)}}{{SMA_{m} \left( {u_{n} (t)} \right) + SMA_{m} \left( {d_{n} (t)} \right)}} $$
    (21)

Where

$$ u_{n} (t) = \left\{ {\begin{array}{*{20}c} {STD_{n} (C_{t} ), \,if\, C_{t} > C_{t - 1} } \\ {0 , Otherwise} \\ \end{array} } \right.,\,\,d_{n} (t) = \left\{ {\begin{array}{*{20}c} {STD_{n} (C_{t} ), \,if\, C_{t} < C_{t - 1} } \\ {0 , Otherwise} \\ \end{array} } \right. $$
$$ STD_{n} \left( {C_{t} } \right) = \sqrt {\frac{{\mathop \sum \nolimits_{i = 1}^{n} \left( {C_{t - i + 1} - SMA_{n} \left( {C_{t} } \right)} \right)^{2} }}{n}} $$
  1. 22.

    AROON_25: 25-day Aroon Oscillator. It can help determine whether a security is trending or not and how strong the trend is.

    $$ Aroon_{n} \left( t \right) = 100 \times \left( {HD_{n} \left( t \right) - LD_{n} \left( t \right)} \right)/n $$
    (22)

Where \( HD_{n} \left( t \right) \) is the number of days since the highest high price during the n observation days, and \( LD_{n} \left( t \right) \) is the number of days since the lowest low price during the n observation days.

  1. 23.

    EMV: Ease of Movement. It demonstrates how much volume is required to move prices.

    $$ EMV_{t} = 1000000 \times (H_{t} - L_{t} )\frac{{(H_{t} - H_{t - 1} ) + (L_{t} - L_{t - 1} )}}{{2 \times V_{t} }} $$
    (23)
  2. 24.

    WEMV_14: 14-day simple moving average of Ease of Movement.

    $$ WEMV_{n} \left( t \right) = SMA_{n} \left( {EMV_{t} } \right), $$
    (24)
  3. 25.

    MACD: Moving Average Convergence and Divergence. It’s represented as the difference between 12-day and 26-day exponential moving average of closing price.

    $$ MACD_{n1, n2} (t) = EMA_{n1} \left( {C_{t} } \right) - EMA_{n2} \left( {C_{t} } \right) $$
    (25)
  4. 26.

    MACDSIG: 9-day simple moving average of MACD

    $$ MACDSIG_{n3} (t) = EMA_{n3} (MACD_{n1, n2} (t)) $$
    (26)
  5. 27.

    MACD_DIFF: the difference between MACD and MACD_SIG

    $$ MACD\_DIFF_{t} = MACD_{n1, n2} \left( t \right) - MACDSIG_{n3} (t) $$
    (27)

Rights and permissions

Reprints and permissions

Copyright information

© 2016 Springer Nature Singapore Pte Ltd.

About this paper

Cite this paper

Pan, J., Zhuang, Y., Fong, S. (2016). The Impact of Data Normalization on Stock Market Prediction: Using SVM and Technical Indicators. In: Berry, M., Hj. Mohamed, A., Yap, B. (eds) Soft Computing in Data Science. SCDS 2016. Communications in Computer and Information Science, vol 652. Springer, Singapore. https://doi.org/10.1007/978-981-10-2777-2_7

Download citation

  • DOI: https://doi.org/10.1007/978-981-10-2777-2_7

  • Published:

  • Publisher Name: Springer, Singapore

  • Print ISBN: 978-981-10-2776-5

  • Online ISBN: 978-981-10-2777-2

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics