Abstract
There are many real applications existing where the decision making process depends on a model that is built by collecting information from different data sources. Let us take the stock market as an example. The decision making process depends on a model which that is influenced by factors such as stock prices, exchange volumes, market indices (e.g. Dow Jones Index), news articles, and government announcements (e.g., the increase of stamp duty). Yet Nevertheless, modeling the stock market is a challenging task because (1) the process related to market states (rise state/drop state) is a stochastic process, which is hard to capture using the deterministic approach, and (2) the market state is invisible but will be influenced by the visible market information, like stock prices and news articles. In this paper, we propose an approach to model the stock market process by using a Non-homogeneous Hidden Markov Model (NHMM). It takes both stock prices and news articles into consideration when it is being computed. A unique feature of our approach is event driven. We identify associated events for a specific stock using a set of bursty features (keywords), which has a significant impact on the stock price changes when building the NHMM. We apply the model to predict the trend of future stock prices and the encouraging results indicate our proposed approach is practically sound and highly effective.
Similar content being viewed by others
References
Adler P A, Adler P. The market as collective behavior. In: The Social Dynamics of Financial Markets, Greenwich: JAI Press, 1984: 85–105
Blumer H. Outline of collective behavior. In: Readings in Collective Behavior. 2nd ed. Pittsburgh: Carnegie Press, 1975: 22–45
Festinger L. A theory of cognitive dissonance. California: Stanford University Press, Reprinted in 1968
Klausner M. Sociological theory and the bechavio of financial markets. The Social Dynamics of Financial Markets, 1984: 57–81
Wu D, Fung G P C, Yu J X, Liu Z Integrating multiple data sources for stock prediction. In: Proceedings of WISE 2008, 2008: 77–89
Lavrenko V, Schmill MD, Lawire D, Ogivie P, Jensen D, Allan J. Mining of Concurrent Text and Time Series. In: Proceedings of KDD00 Workshop on Text Mining, 2000
Hughes J P, Guttorp P, Charles S P. A non-homogeneous hidden Markov model for precipitation occurrence. Applied Statistics, 1999, 48(1): 15–30
Bodie Z, Kane A, Marcus A J. Investments. Chicago: Irwin, third edition, 1996
X. Ge, P. Smyth, Deformable markov model templates for time-series pattern matching. In: Proceedings of KDD00, 2000: 81–90
Holmes W J, Russell M J. Probabilistic-trajectory segmental hmms. Computer Speech and Language, 1999,13: 3–38
Jurafsky D, Martin J H. Speech and Language Processing: An Introduction to Natural Language Processing, Computational Linguistics, and Speech Recognition. Prentice-Hall, 2000
Kirshner S. Modeling of multivariate time series using hidden Markov models. PhD thesis, University of California, Irvine, 2005
Fung G P C, Yu J X, Yu P S, Lu H. Parameter free bursty events detection in text streams. In: Proceedings of VLDB05, 2005: 181–192
Kohara K, Ishikawa T, Fukuhara Y, Nakamura Y. Stock Price Prediction using prior knowledge and neural networks. Intelligent Systems in Accounting, Finance and Management. 1997, 6: 11–12
Keogh E J, Chu S, Hart D, Pazzani M J. An online algorithm for segmenting time series. In: Proceedings of ICDM01, 2001: 289–296
Salton G, McGill M J. Introduction to Modern Information Retrieval. McGraw-Hill Inc., 1986
Fung G P C, Yu J X, Lam W. News sensitive stock trend prediction. In: Proceedings of PAKDD02, 2002: 481–493
DPang-Ning Tan M S, Kumar V. Introduction to Data Mining. New York: Addison-Wesley, 2006
Hellstrom T, Holmstrom K. Predicting the Stock Market. Sweden: Marardalen university, 1998
Klein F, Prestbo J A. News and the Market. Chicago: Henry Regenry, 1974
Fawcett T, Provost F J. Activity monitoring: Noticing interesting changes in behavior. In: Proceedings of KDD 99, 1999: 53–62
Thomas J D, Sycara K. Integrating genetic algorithms and text learning for financial prediction. In: Proceedings of the Genetic and Evolutionary Computing 2000 Conference Workshop on Data Mining with Evolutionary Algorithms, 2000
Nigam K, Lafferty J, McCallum A. Using maximunm entropy for text classification. In: Proceedings of the 16th International Joint Conference Workshop on Machine Learning for Information Filtering, 1999
Wthrich B, Permunetilleke D, Leung S, Cho V, Zhang J, Lam W. Daily prediction of major stock indices from textual www data. In: Proceedings of KDD98, 1998: 364–368
Wthrich B. Probabilistic knowledge bases. IEEE Transactions on Knowledge and Data Engineering, 1995,7(5): 691–698
Ponte J M, Croft W B. A language modeling approach to information retrieval. In: Proceedings of SIGIR98, 1998: 275–281
Fung G P C, Yu J X, Lu H. The predicting power of textual information on financial markets. IEEE Intelligent Informatics Bulletin, 2005,5(1):1–10
Joachims T. Text categorization with support vector machines: Learning with many relevant features. In Proceedings of 10th European Conference on Machine Learning (ECML98), Chemnitz, Germany, 1998: 137–142
Mittermayer M A, Knolmayer G F. Newscats: A news categorization and trading system. In: Proceedings of ICDM 06, 2006: 1002–1007
Kim S, Smyth P, Luther S. Modeling waveform shapes with random effects segmental hidden markov models. In: Proceedings of the 20th conference on Uncertainty in artificial intelligence, 2004: 309–316
Basseville M, Nikiforov I. Detection of Abrupt Changes: Theory and Applications. Prentice-Hall, 1993
Author information
Authors and Affiliations
Corresponding author
Rights and permissions
About this article
Cite this article
Wu, D., Fung, G.P.C., Yu, J.X. et al. Stock prediction: an event-driven approach based on bursty keywords. Front. Comput. Sci. China 3, 145–157 (2009). https://doi.org/10.1007/s11704-009-0029-z
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s11704-009-0029-z