Abstract
Online stock forums have become a vital investing platform for publishing relevant and valuable user-generated content (UGC) data, such as investment recommendations that allow investors to view the opinions of a large number of users, and the sharing and exchanging of trading ideas. This paper combines text-mining, feature selection and Bayesian Networks to analyze and extract sentiments from stock-related micro-blogging messages called “StockTwits”. Here, we investigate whether the power of the collective sentiments of StockTwits might be predicted and how these predicted sentiments might help investors and their peers to make profitable investment decisions in the stock market. Specifically, we build Bayesian Networks from terms identified in the tweets that are selected using wrapper feature selection. We then used textual visualization to provide a better understanding of the predicted relationships among sentiments and their related features.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Preview
Unable to display preview. Download preview PDF.
References
Atsalakis, G.S., Valavanis, K.P.: Surveying stock market forecasting techniques – Part II: Softcomputing methods. Expert Systems with Applications 36(3), 5932–5941 (2009)
Claburn, T.: “Twitter growth surges 131% in March Information Week (2009), http://www.informationweek.com/news/internet/social_network/showArticle.jhtml?articleID=216500968 (retrieved October 25, 2010)
Friedman, N., Geiger, D., Goldszmidt, M.: Bayesian network classifiers. Machine Learning 29(2-3), 131–163 (1997)
Guresen, E., Kayakutlu, G., Daim, T.U.: Using artificial neural network models in stock market index prediction. Expert Systems with Applications 38(8), 10389–10397 (2011)
Guyon, I., Elisseeff, A.: An introduction to variable and feature selection. The Journal of Machine Learning Research 3, 1157–1182 (2003)
Huang, C.-J., Yang, D.-X., Chuang, Y.-T.: Application of wrapper approach and composite classifier to the stock trend prediction. Expert Systems with Applications 34(4), 2870–2878 (2008)
John, G.H., Kohavi, R., Pfleger, K.: Irrelevant Features and the Subset Selection Problem. In: ICML, vol. 94, pp. 121–129 (1994)
Kohavi, R., John, G.H.: Wrappers for feature subset selection. Artificial Intelligence 97(1), 273–324 (1997)
Kramer, A.D.: An unobtrusive behavioral model of gross national happiness. In: Proceedings of the 28th International Conference on Human Factors in Computing Systems, p. 287. ACM (2010)
Lee, M.-C.: Using support vector machine with a hybrid feature selection method to the stock trend prediction. Expert Systems with Applications 36(8) (2009)
Loughran, T., McDonald, B.: When is a liability not a liability? Textual analysis, dictionaries, and 10-Ks. The Journal of Finance 66(1), 35–65 (2011)
Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann, P., Witten, I.H.: The WEKA Data Mining Software: An Update. SIGKDD Explorations 11(1) (2009)
Ni, L.-P., Ni, Z.-W., Gao, Y.-Z.: Stock trend prediction based on fractal feature selection and support vector machine. Expert Systems with Applications 38(5), 5569–5576 (2011)
Pang, B., Lee, L., Vaithyanathan, S.: Thumbs up?: sentiment classification using machine learning techniques. In: Proceedings of the ACL 2002 Conference on Empirical Methods in Natural Language Processing, vol. 10, p. 79. Association for Computational Linguistics (2002)
Pazzani, M., Muramatsu, J., Billsus, D.: Syskill&Webert: Identifying Interesting Web Sites. In: Proceedings of the Thirteenth National Conference on Artificial Intelligence, pp. 54–61. AAAI Press, Portland (1996)
Pearl, J.: Probabilistic reasoning in intelligent systems: networks of plausible inference. Morgan Kaufmann (1988)
R Development Core Team: R: A language and environment for statistical computing. R Foundation for Statistical Computing, Vienna, Austria (2012), http://www.R-project.org/ , ISBN 3-900051-07-0
Sima, C., Dougherty, E.R.: The peaking phenomenon in the presence of feature-selection. Pattern Recognition Letters 29(11), 1667–1674 (2008)
Stein, G., Chen, B., Wu, A.S.,Hua, K.A.: Decision tree classifier for network intrusion detection with GA-based feature selection. In: Proceedings of the 43rd Annual Southeast Regional Conference, vol. 2, pp. 136–141. ACM (March 2005)
Sui, X.-S., Qi, Z.-Y., Yu, D.-R., Hu, Q.-H., Zhao, H.: A novel feature selection approach using classification complexity for SVM of stock market trend prediction. In: International Conference on Management Science and Engineering, Harbin, China, pp. 1654–1659 (2007)
Tan, T.Z., Quek, C., Ng, G.S.: Biological brain-inspired genetic complementary learning for stock market and bank failure prediction. Computational Intelligence 23(2), 236–261 (2007)
Turney, P.D.: Thumbs up or thumbs down?: semantic orientation applied to unsupervised classification of reviews. In: Proceedings of the 40th Annual Meeting on Association for Computational Linguistics, p. 417. Association for Computational Linguistics (2002)
Wordle-Beautiful Word clouds (May 20, 2014), http://www.wordle.net/creat
Author information
Authors and Affiliations
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2014 Springer International Publishing Switzerland
About this paper
Cite this paper
Al Nasseri, A., Tucker, A., de Cesare, S. (2014). Big Data Analysis of StockTwits to Predict Sentiments in the Stock Market. In: Džeroski, S., Panov, P., Kocev, D., Todorovski, L. (eds) Discovery Science. DS 2014. Lecture Notes in Computer Science(), vol 8777. Springer, Cham. https://doi.org/10.1007/978-3-319-11812-3_2
Download citation
DOI: https://doi.org/10.1007/978-3-319-11812-3_2
Publisher Name: Springer, Cham
Print ISBN: 978-3-319-11811-6
Online ISBN: 978-3-319-11812-3
eBook Packages: Computer ScienceComputer Science (R0)