Integrating Multiple Data Sources for Stock Prediction

  • Di Wu
  • Gabriel Pui Cheong Fung
  • Jeffrey Xu Yu
  • Zheng Liu
Part of the Lecture Notes in Computer Science book series (LNCS, volume 5175)


In many real world applications, decisions are usually made by collecting and judging information from multiple different data sources. Let us take the stock market as an example. We never make our decision based on just one single piece of advice, but always rely on a collection of information, such as the stock price movements, exchange volumes, market index, as well as the information from the news articles, expert comments and special announcements (e.g., the increase of stamp duty). Yet, modeling the stock market is difficult because: (1) The process related to market states (up and down) is a stochastic process, which is hard to capture by using the deterministic approach; and (2) The market state is invisible but will be influenced by the visible market information, such as stock prices and news articles. In this paper, we try to model the stock market process by using a Non-homogeneous Hidden Markov Model (NHMM) which takes multiple sources of information into account when making a future prediction. Our model contains three major elements: (1) External event, which denotes the events happening within the stock market (e.g., the drop of US interest rate); (2) Observed market state, which denotes the current market status (e.g. the rise in the stock price); and (3) Hidden market state, which conceptually exists but is invisible to the market participants. Specifically, we model the external events by using the information contained in the news articles, and model the observed market state by using the historical stock prices. Base on these two pieces of observable information and the previous hidden market state, we aim to identify the current hidden market state, so as to predict the immediate market movement. Extensive experiments were conducted to evaluate our work. The encouraging results indicate that our proposed approach is practically sound and effective.


Unable to display preview. Download preview PDF.

Unable to display preview. Download preview PDF.


  1. 1.
    Adler, P.A., Adler, P.: The market as collective behavior. In: The Social Dynamics of Financial Markets, pp. 85–105 (1984)Google Scholar
  2. 2.
    Bodie, Z., Kane, A., Marcus, A.J.: Investments, 3rd edn. Irwin, Chicago (1996)Google Scholar
  3. 3.
    Fung, G.P.C., Yu, J.X., Lam, W.: News sensitive stock trend prediction. In: Chen, M.-S., Yu, P.S., Liu, B. (eds.) PAKDD 2002. LNCS (LNAI), vol. 2336, pp. 481–493. Springer, Heidelberg (2002)CrossRefGoogle Scholar
  4. 4.
    Fung, G.P.C., Yu, J.X., Lu, H.: The predicting power of textual information on financial markets. IEEE Intelligent Informatics Bulletin 5(1), 1–10 (2005)Google Scholar
  5. 5.
    Ge, X., Smyth, P.: Deformable markov model templates for time-series pattern matching. In: Proc. of KDD 2000, pp. 81–90 (2000)Google Scholar
  6. 6.
    Hellstrom, T., Holmstrom, K.: Predicting the stock market (1998)Google Scholar
  7. 7.
    Hughes, J.P., Guttorp, P., Charles, S.P.: A non-homogeneous hidden Markov model for precipitation occurrence. Applied Statistics 48(1), 15–30 (1999)MATHGoogle Scholar
  8. 8.
    Jurafsky, D., Martin, J.H.: Speech and Language Processing: An Introduction to Natural Language Processing, Computational Linguistics, and Speech Recognition. Prentice-Hall, Englewood Cliffs (2000)Google Scholar
  9. 9.
    Keogh, E.J., Chu, S., Hart, D., Pazzani, M.J.: An online algorithm for segmenting time series. In: Proc. of ICDM 2001, pp. 289–296 (2001)Google Scholar
  10. 10.
    Kirshner, S.: Modeling of multivariate time series using hidden Markov models. PhD thesis, Univedrsity of California, Irvine (2005)Google Scholar
  11. 11.
    Klein, F., Prestbo, J.A.: News and the Market. Henry Regenry, Chicago (1974)Google Scholar
  12. 12.
    Lavrenko, V., Schmill, M.D., Lawire, D., Ogivie, P., Jensen, D., Allan, J.: Mining of Concurrent Text and Time Series. In: Proc. of KDD 2000 Workshop on Text Mining (2000)Google Scholar
  13. 13.
    Luenberger, D.G.: Investment Science. Prentice Hall, Englewood Cliffs (1997)Google Scholar
  14. 14.
    Pang-Ning Tan, M.S., Kumar, V.: Introduction to Data Mining. Addison-Wesley, Reading (2006)Google Scholar
  15. 15.
    Salton, G., McGill, M.J.: Introduction to Modern Information Retrieval. McGraw-Hill, Inc., New York (1986)Google Scholar
  16. 16.
    Wüthrich, B.: Probabilistic knowledge bases. IEEE Transactions on Knowledge and Data Engineering 7(5), 691–698 (1995)CrossRefGoogle Scholar
  17. 17.
    Wu, H., Salzberg, B., Zhang, D.: Online event-driven subsequence matching over financial data streams. In: SIGMOD Conference, pp. 23–34 (2004)Google Scholar
  18. 18.
    Wuthrich, B., Permunetilleke, D., Leung, S., Cho, V., Zhang, J., Lam, W.: Daily prediction of major stock indices from textual www data. In: Proc. of KDD 1998, pp. 364–368 (1998)Google Scholar

Copyright information

© Springer-Verlag Berlin Heidelberg 2008

Authors and Affiliations

  • Di Wu
    • 1
  • Gabriel Pui Cheong Fung
    • 2
  • Jeffrey Xu Yu
    • 1
  • Zheng Liu
    • 1
  1. 1.Dept. of Sys. Eng. & Eng. Mgt.The Chinese University of Hong Kong 
  2. 2.School of Info. Tech. & Elec. Eng.The University of Queensland 

Personalised recommendations