Skip to main content
Log in

Composite leading search index: a preprocessing method of internet search data for stock trends prediction

  • Published:
Annals of Operations Research Aims and scope Submit manuscript

Abstract

Previous studies have revealed that Internet search data is a new source of data that can be used to predict the stock market. In this new, data-driven research field, choosing a method for preprocessing data is crucial to achieving accurate prediction performance. This paper proposes a preprocessing method of Internet search data: composite leading search index (CLSI), which is composed of three steps: (a) keyword selection, (b) time difference measurement, and (c) leading index composition. We demonstrate the validity of CLSI by comparing this method’s results with the results from search volume index (SVI), which is most commonly used in previous literatures. We build a time series model (TS) with error correction and support vector regression (SVR) for stock trend prediction, and combine into four models for comparison: SVI–TS, CLSI–TS, SVI–SVR, and CLSI–SVR. We test these four models in the context of the Chinese stock market, which interests more and more investors nowadays, and analyzed results in nine datasets: stable periods, peak periods and trough periods of Shanghai Composite Index, Shenzhen Composite Index, and Hushen 300 index respectively. The results show that using TS and SVR as forecasting models, CLSI performs better than SVI on majority of the test dataset while has almost the same performance with that of SVI on the remaining test dataset. It is to some extent convincing that CLSI is a more efficient preprocessing method of Internet search data for stock trend prediction.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8

Similar content being viewed by others

Notes

  1. Available at http://www.csie.ntu.edu.tw/~cjlin/libsvm/index.html.

  2. In optimization, if KKT conditions are satisfied, then the solution of the original optimization problems is identical to that of the dual problems. In applications of Support Vector Machines (including classification and regression), dual problems rather than original problems are solved.

References

  • Antweiler, W., & Frank, M. Z. (2004). Is all that talk just noise? The information content of internet stock message boards. The Journal of Finance, 59(3), 1259–1294.

    Article  Google Scholar 

  • Askitas, N., & Zimmermann, K. F. (2009). Google econometrics and unemployment forecasting. Applied Economics Quarterly, 55(2), 107–120.

    Article  Google Scholar 

  • Boehm, E. A. (2001). The contribution of economic indicator analysis to understanding and forecasting business cycles. Indian Economic Review, 36, 1–36.

    Google Scholar 

  • Bowerman, B. L., O’Connell, R. T., & Koehler, A. B. (2004). Forecasting, time series and regression: An applied approach. Belmont, CA: Thomson Brooks/Cole.

    Google Scholar 

  • Cao, Q., Parry, M. E., & Leggio, K. B. (2011). The three-factor model and artificial neural networks: Predicting stock price movement in China. Annals of Operations Research, 185(1), 25–44.

    Article  Google Scholar 

  • Choi, H., & Varian, H. (2012). Predicting the present with Google trends. Economic Record, 88(s1), 2–9.

    Article  Google Scholar 

  • Clarkson, G. P. E. (1963). A model of the trust investment process. Computers and thought. New York: McGraw-Hill.

    Google Scholar 

  • Capon, N., Fitzsimons, G. J., & Prince, R. A. (1996). An individual level analysis of the mutual fund investment decision. Journal of Financial Services Research, 10, 59–82.

    Article  Google Scholar 

  • Da, Z., Engelberg, J., & Gao, P. (2011). In search of attention. The Journal of Finance, 66(5), 1461–1499.

    Article  Google Scholar 

  • Granger, C. W. (1988). Some recent development in a concept of causality. Journal of Econometrics, 39(1), 199–211.

    Article  Google Scholar 

  • Ginsberg, J., Mohebbi, M. H., Patel, R. S., Brammer, L., Smolinski, M. S., & Brilliant, L. (2009). Detecting influenza epidemics using search engine query data. Nature, 457, 1012–1014.

    Article  Google Scholar 

  • Huang, W., Nakamori, Y., & Wang, S. (2005). Forecasting stock market movement direction with using support vector machine. Computers and Operations Research, 32, 2513–2522.

    Article  Google Scholar 

  • Hulth, A., Rydevik, G., & Linde, A. (2009). Web queries as a source for syndromic surveillance. PLoS One, 4(2), e4378.

    Article  Google Scholar 

  • Kahneman, D. (1973). Attention and effort. Englewood Cliffs, NJ: Prentice-Hall.

    Google Scholar 

  • Kim, K. (2003). Financial time series forecasting using support vector machines. Neurocomputing, 55, 307–319.

    Article  Google Scholar 

  • Kullback, S. (1987). The kullback–leibler distance. The American Statistician, 41(4), 340–341.

    Google Scholar 

  • Mao, H., Counts, S., Bollen, J. (2011). Predicting financial markets: Comparing survey, news, twitter and search engine data. arXiv preprint arXiv:1112.1051

  • Moore, G. H., & Shiskin, J. (1967). Indicators of business expansions and contractions. NBER. Occasional Paper, No 103.

  • Mitchell, T. (2009). Mining our reality. Science, 326, 1644–1645.

    Article  Google Scholar 

  • Hanke, J. E., & Reitsch, A. G. (1995). Business forecasting (5th ed.). Englewood Cliffs, NJ: Prentice-Hall.

    Google Scholar 

  • Smith, G. P. (2012). Google internet search activity and volatility prediction in the market for foreign currency. Finance Research Letters, 9(2), 103–110.

    Article  Google Scholar 

  • Tierney, H. L. R., & Pan, B. (2012). A poisson regression examination of the relationship between website traffic and search engine queries. NETNOMICS: Economic Research and Electronic Networking, 13, 155–189.

  • Tumarkin, R., & Whitelaw, R. F. (2001). News or noise? Internet postings and stock prices. Financial Analysts, 57(3), 41–51.

    Article  Google Scholar 

  • Wang, L., & Zhu, J. (2010). Financial market forecasting using a two-step kernel learning method for the support vector regression. Annals of Operations Research, 174(1), 103–120.

    Article  Google Scholar 

Download references

Acknowledgments

This work has been partially supported by the National Natural Science Foundation of China under Grant 71202115, 71172199, 71201143, and 70972104, Beijing Natural Science Foundation under Grant 9143021, Postdoctoral Science Foundation of China under Grant 2013T60158, and Sponsorship from China scholarship Council (CSC).

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Geng Peng.

Additional information

The authors are grateful to the anonymous reviewers and editors for their helpful comments and suggestions to improve the paper.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Liu, Y., Chen, Y., Wu, S. et al. Composite leading search index: a preprocessing method of internet search data for stock trends prediction. Ann Oper Res 234, 77–94 (2015). https://doi.org/10.1007/s10479-014-1779-z

Download citation

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s10479-014-1779-z

Keywords

Navigation