Abstract
Accurate stock market prediction is of great interest to investors; however, stock markets are driven by volatile factors such as microblogs and news that make it hard to predict stock market index based on merely the historical data. The enormous stock market volatility emphasizes the need to effectively assess the role of external factors in stock prediction. Stock markets can be predicted using machine learning algorithms on information contained in social media and financial news, as this data can change investors’ behavior. In this paper, we use algorithms on social media and financial news data to discover the impact of this data on stock market prediction accuracy for ten subsequent days. For improving performance and quality of predictions, feature selection and spam tweets reduction are performed on the data sets. Moreover, we perform experiments to find such stock markets that are difficult to predict and those that are more influenced by social media and financial news. We compare results of different algorithms to find a consistent classifier. Finally, for achieving maximum prediction accuracy, deep learning is used and some classifiers are ensembled. Our experimental results show that highest prediction accuracies of 80.53% and 75.16% are achieved using social media and financial news, respectively. We also show that New York and Red Hat stock markets are hard to predict, New York and IBM stocks are more influenced by social media, while London and Microsoft stocks by financial news. Random forest classifier is found to be consistent and highest accuracy of 83.22% is achieved by its ensemble.
Similar content being viewed by others
References
Afzal H, Mehmood K (2016) Spam filtering of bi-lingual tweets using machine learning. In: IEEE 18th international conference on ICACT, pp 710–714
Alostad H, Davulcu H (2015) Directional prediction of stock prices using breaking news on Twitter. In: IEEE/WIC/ACM international conference on WI-IAT 1, pp 523–530
Al-Zoubi A, Faris H (2017) Spam profile detection in social networks based on public features. In: IEEE 8th international conference ICICS, pp 130–135
Attigeri GV, MM MP, Pai RM, Nayak A (2015) Stock market prediction: a big data approach. In: IEEE region 10 conference on TENCON, pp 1–5
Bastianin A, Manera M (2018) How does stock market volatility react to oil price shocks? Mach Dyn 22(3):666–682
Blum C, Li X (2008) Swarm intelligence in optimization. In: Dorigo M (ed) Swarm intelligence. Springer, Berlin, pp 43–85
Brezočnik L, Fister I, Podgorelec V (2018) Swarm intelligence algorithms for feature selection: a review. Appl Sci 8(9):1521
Brown GW, Cliff MT (2004) Investor sentiment and the near-term stock market. J Empir Financ 11(1):1–27
Cao J, Cui H, Shi H, Jiao L (2016) Big data: a parallel particle swarm optimization-back-propagation neural network algorithm based on MapReduce. PLoS ONE 11(6):e0157551
Chakraborty P, Pria US, Rony M, Majumdar MA (2017) Predicting stock movement using sentiment analysis of Twitter feed. In: IEEE 6th international conference ICIEV-ISCMHT, pp 1–6
Chen W, Yeo CK, Lau CT, Lee BS (2017a) A study on real-time low-quality content detection on Twitter from the users’ perspective. PLoS ONE 12(8):e0182487
Chen W, Zhang Y, Yeo CK, Lau CT, Lee BS (2017b) Stock market prediction using neural network through news on online social networks. In: IEEE international ISC2, pp 1–6
Chen L, Qiao Z, Wang M, Wang C, Du R, Stanley HE (2018) Which artificial intelligence algorithm better predicts the Chinese stock market? IEEE Access 6:48625–48633
Cheng S, Shi Y, Qin Q, Bai R (2013) Swarm intelligence in big data analytics. In: International conference on intelligent data engineering and automated learning. Springer, Berlin, pp 417–426
Chhikara RR, Sharma P, Singh L (2018) An improved dynamic discrete firefly algorithm for blind image steganalysis. Int J Mach Learn Cybern 9(5):821–835
Chou JS, Lin C (2012) Predicting disputes in public-private partnership projects: classification and ensemble models. J Comput Civ Eng 27(1):51–60
Dang M, Duong D (2016) Improvement methods for stock market prediction using financial news articles. In: IEEE 3rd national foundation for science and technology development conference on information and computer science (NICS), pp 125–129
Dang LM, Sadeghi-Niaraki A, Huynh HD, Min K, Moon H (2018) Deep learning approach for short-term stock trends prediction based on two-stream gated recurrent unit network. IEEE Access 6:55392–55404
Dorigo M (1992) Learning and natural algorithms. Ph.D. Thesis, Politecnico di Milano, Milano, Italy
Džeroski S, Ženko B (2004) Is combining classifiers with stacking better than selecting the best one? J Mach Learn 54(3):255–273
Eberhart R, Kennedy J (1995) Particle swarm optimization. In: Proceedings of the IEEE international conference on neural networks, pp 1942–1948
Enache AC, Sgarciu V, Petrescu-Niţă A (2015) Intelligent feature selection method rooted in Binary Bat Algorithm for intrusion detection. In: 2015 IEEE 10th Jubilee international symposium on applied computational intelligence and informatics. IEEE, pp 517–521
Gidofalvi G, Elkan C (2001) Using news articles to predict stock price movements. Department of Computer Science and Engineering, University of California, San Diego
Hajdu A, Hajdu L, Jonas A, Kovacs L, Toman H (2013) Generalizing the majority voting scheme to spatially constrained voting. IEEE Trans Image Proc 22(11):4182–4194
Hassanien AE, Emary E (2016) Swarm intelligence: principles, advances, and applications. CRC Press, Boca Raton
Hastie T, Tibshirani R, Friedman J (2009) The elements of statistical learning: data mining, inference, and prediction, 2nd edn. Springer, New York, p 745
He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Proceedings of IEEE conference on CVPR 2016, pp 770–778
Hegazy O, Soliman OS, Salam MA (2014) A machine learning model for stock market prediction. Int J Comput Sci Telecommun 4(12):16–23
Hentschel M, Alonso O (2014) Follow the money: a study of cashtags on Twitter. First Monday 19(8). https://doi.org/10.5210/fm.v19i8.5385
Hu Z, Chiong R, Pranata I, Susilo W, Bao Y (2016) Identifying malicious web domains using machine learning techniques with online credibility and performance data. In: 2016 IEEE congress on evolutionary computation (CEC). IEEE, pp 5186–5194
Hunter JD (2007) Matplotlib: a 2D graphics environment. Comput Sci Eng 9(3):90–95. https://doi.org/10.1109/MCSE.2007.55
Ibrahim RA, Ewees AA, Oliva D, Elaziz MA, Lu S (2019) Improved salp swarm algorithm based on particle swarm optimization for feature selection. J Ambient Intell Humaniz Comput. https://doi.org/10.1007/s12652-018-1031-9
Jayaraman V, Sultana HP (2019) Artificial gravitational cuckoo search algorithm along with particle bee optimized associative memory neural network for feature selection in heart disease classification. J Ambient Intell Humaniz Comput. https://doi.org/10.1007/s12652-019-01193-6
Jeon S, Hong B, Chang V (2018) Pattern graph tracking-based stock price prediction using big data. J Future Gener Comput Syst. https://doi.org/10.1016/j.future.2017.02.010
Joshi R, Tekchandani R (2016) Comparative analysis of Twitter data using supervised classifiers. In: IEEE international conference ICICT, 3 pp 1–6
Karaboga D (2005) An idea based on honey bee swarm for numerical optimization. Technical report-tr06, Erciyes University, Engineering Faculty, Computer Engineering Department, vol 200, pp 1–10
Khan W, Malik U, Ghazanfar MA, Azam MA, Alyoubi KH, Alfakeeh AS (2019) Predicting stock market trends using machine learning algorithms via public sentiment and political situation analysis. Soft Comput. https://doi.org/10.1007/s00500-019-04347-y
Khare K, Darekar O, Gupta P, Attar VZ (2017) Short term stock price prediction using deep learning. In: 2nd IEEE international conference RTEICT, pp 482–486
Khatri SK, Srivastava A (2016) Using sentimental analysis in prediction of stock market investment. In: IEEE 5th international conference ICRITO, pp 566–569
Kim E, Kim W, Lee Y (2003) Combination of multiple classifiers for the customer’s purchase behavior prediction. J Decis Support Syst 34(2):167–175
Kohavi R (1995) A study of cross-validation and bootstrap for accuracy estimation and model selection. IJCAI 14(2):1137–1145
Kuhn M, Johnson K (2013) Applied predictive modeling. Springer, New York
Kumar PH, Patil SB (2015) Volatility forecasting using machine learning and time series techniques. IJRCCE 3(9):8284–8292
Lakshmi V, Harika K, Bavishya H, Harsha CS (2017) Sentiment analysis of twitter data. Int Res J Eng Technol 4(2):2224–2227
Li X (2003) A new intelligent optimization-artificial fish swarm algorithm. Ph.D. Thesis, Zhejiang University, Hangzhou, China
Li Q, Wang T, Li P, Liu L, Gong Q, Chen Y (2014a) The effect of news and public mood on stock movements. J Inf Sci 278:826–840. https://doi.org/10.1016/j.ins.2014.03.096
Li X, Huang X, Deng X, Zhu S (2014b) Enhancing quantitative intra-day stock return prediction by integrating both market news and stock prices information. J Neuro Comput 142:228–238
Li X, Xie H, Chen L, Wang J, Deng X (2014c) News impact on stock price return via sentiment analysis. J Knowl-Based Syst 69:14–23. https://doi.org/10.1016/j.knosys.2014.04.022
Li J, Bu H, Wu J (2017) Sentiment-aware stock market prediction: a deep learning method. In: IEEE international conference ICSSSM, pp 1–6
Liu R, Li W, Liu X, Lu X, Li T, Guo Q (2018) An ensemble of classifiers based on positive and unlabeled data in one-class remote sensing classification. IEEE J Sel Top Appl Earth Obs Remote Sens 11(2):572–584
Makrehchi M, Shah S, Liao W (2013) Stock prediction using event-based sentiment analysis. In: IEEE/WIC/ACM international joint conference on WI and IAT,1, pp 337–342
Mirjalili S, Mirjalili SM, Lewis A (2014) Grey wolf optimizer. J Adv Eng Softw 69:46–61
Mirjalili S, Gandomi AH, Mirjalili SZ, Saremi S, Faris H, Mirjalili SM (2017) Salp Swarm Algorithm: a bio-inspired optimizer for engineering design problems. Adv Eng Softw 114:163–191
Mohammadi FG, Abadeh MS (2014) Image steganalysis using a bee colony based feature selection algorithm. J Eng Appl Artif Intell 31:35–43
Moslehi F, Haeri A (2019) A novel hybrid wrapper–filter approach based on genetic algorithm, particle swarm optimization for feature subset selection. J Ambient Intell Humaniz Comput. https://doi.org/10.1007/s12652-019-01364-5
Noda K, Yamaguchi Y, Nakadai K, Okuno HG, Ogata T (2015) Audio-visual speech recognition using deep learning. J Appl Intell 42(4):722–737
Omer NAB, Halim FA (2015) Modelling volatility of Malaysian stock market using garch models. In: IEEE international symposium iSMSC, pp 447–452
Ou P, Wang H (2009) Prediction of stock market index movement by ten data mining techniques. Mod Appl Sci 3(12):28
Passino KM (2002) Biomimicry of bacterial foraging for distributed optimization and control. IEEE Control Syst Mag 22:52–67
Pedregosa et al (2011) Scikit-learn: machine learning in Python. JMLR 12:2825–2830
Qasem M, Thulasiram R, Thulasiram P (2015) Twitter sentiment classification using machine learning techniques for stock markets. In: IEEE international conference on ICACCI, Kochi, India, pp 834–840
Saraç E, Özel SA (2014) An ant colony optimization based feature selection for web page classification. Sci World J 2014:649260. https://doi.org/10.1155/2014/649260
Sattiraju M, Manikantan K, Ramachandran S (2013) Adaptive BPSO based feature selection and skin detection based background removal for enhanced face recognition. In: 2013 4th national conference on computer vision, pattern recognition, image processing and graphics (NCVPRIPG). IEEE, pp 1–4
Sedhai S, Sun A (2015) HSpam14: a collection of 14 million tweets for hashtag-oriented spam research. In: 38th ACM conference on SIGIR, pp 223–232
Sedhai S, Sun A (2018) Semi-supervised spam detection in Twitter stream. IEEE Trans Comput Soc Syst 5(1):169–175
Seth JK, Chandra S (2016) Intrusion detection based on key feature selection using binary GWO. In: 2016 3rd international conference on computing for sustainable global development (INDIACom). IEEE, pp 3735–3740
Shen S, Jiang H, Zhang T (2012) Stock market forecasting using machine learning algorithms. Department of Electrical Engineering, Stanford University, Stanford, pp 1–5
Socher R, Perelygin A, Wu J, Chuang J, Manning CD, Ng A, Potts C (2013) Recursive deep models for semantic compositionality over a sentiment Treebank. In: Proceedings of 2013 conference on empirical methods in natural language processing, pp 1631–1642
Sun J, Li H (2012) Financial distress prediction using support vector machines: ensemble vs. individual. J Appl Soft Comput 12(8):2254–2265
Tayal D, Komaragiri S (2009) Comparative analysis of the impact of blogging and micro-blogging on market performance. Int J Comput Sci Eng 1(3):176–182
Thu HLT, Marrero-Ponce Y, Cansañola-Martin GM, Cardoso GC, Chávez MC, Garcia MM, Morell C, Torrens F, Abad C (2011) A comparative study of nonlinear machine learning for the “in silico” depiction of tyrosinase inhibitory activity from molecular structure. Mol Inform 30(6–7):527–537
Tirea M, Negru V (2015) Text mining news system-quantifying certain phenomena effect on the stock market behavior. In: IEEE 17th international symposium on SYNASC, pp 391–398
Todorovski L, Džeroski S (2003) Combining classifiers with meta decision trees. J Mach Learn 50(3):223–249
Tsai CF, Lin YC, Yen DC, Chen YM (2011) Predicting stock returns by classifier ensembles. J Appl Soft Comput. https://doi.org/10.1016/j.asoc.2010.10.001
Urolagin S (2017) Text mining of tweet for sentiment classification and association with stock prices. In: IEEE ICCA, pp 384–388
Usmani M, Adil SH, Raza K, Ali SA (2016) Stock market prediction using machine learning techniques. In: IEEE 3rd international conference on ICCOINS, pp 322–327
Vargas MR, dos Anjos CEM, Bichara GLG, Evsukoff AG (2018) Deep learning for stock market prediction using technical indicators and financial news articles. In: IEEE international joint conference IJCNN, pp 1–8
Wang G, Dai D (2013) Network intrusion detection based on the improved artificial fish swarm algorithm. J Comput 8(11):2990–2996
Wang F, Zhao Z, Li X, Yu F, Zhang H (2014) Stock volatility prediction using multi-kernel learning based extreme learning machine. In: IEEE joint conference IJCNN, pp 3078–3085
Wang H, Jing X, Niu B (2016) Bacterial-inspired feature selection algorithm and its application in fault diagnosis of complex structures. In: 2016 IEEE congress on evolutionary computation (CEC). IEEE, pp 3809–3816
Yan D, Zhou G, Zhao X, Tian Y, Yang F (2016) Predicting stock using microblog moods. J China Commun 13(8):244–257
Yang X-S (2008) Firefly algorithm. In: Nature-inspired metaheuristic algorithms. Luniver Press, Beckington, pp 128
Yang XS (2010) A new metaheuristic bat-inspired algorithm. In Nature inspired cooperative strategies for optimization (NICSO 2010) Springer, Berlin, pp 65–74
Yang XS, Deb S (2009) Cuckoo search via Lévy flights. In: 2009 world congress on nature & biologically inspired computing (NaBIC). IEEE, pp 210–214
Yetis Y, Kaplan H, Jamshidi M (2014) Stock market prediction by using artificial neural network. In: IEEE WAC, pp 718–722
Yuan B (2016) Sentiment analysis of Twitter data. M.S. thesis, Department of Computer Science, Rensselaer Polytechnic Institute, New York
Zhong X, Enke D (2016) Forecasting daily stock market return using dimensionality reduction. Exp Syst Appl 67:126–139. https://doi.org/10.1016/j.eswa.2016.09.027
Zhou Z, Zhao J, Xu K (2016) Can online emotions predict the stock market in China? In: international conference on web information systems engineering, pp 328–342
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Rights and permissions
About this article
Cite this article
Khan, W., Ghazanfar, M.A., Azam, M.A. et al. Stock market prediction using machine learning classifiers and social media, news. J Ambient Intell Human Comput 13, 3433–3456 (2022). https://doi.org/10.1007/s12652-020-01839-w
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s12652-020-01839-w