Skip to main content

Stock market prediction using machine learning classifiers and social media, news

Abstract

Accurate stock market prediction is of great interest to investors; however, stock markets are driven by volatile factors such as microblogs and news that make it hard to predict stock market index based on merely the historical data. The enormous stock market volatility emphasizes the need to effectively assess the role of external factors in stock prediction. Stock markets can be predicted using machine learning algorithms on information contained in social media and financial news, as this data can change investors’ behavior. In this paper, we use algorithms on social media and financial news data to discover the impact of this data on stock market prediction accuracy for ten subsequent days. For improving performance and quality of predictions, feature selection and spam tweets reduction are performed on the data sets. Moreover, we perform experiments to find such stock markets that are difficult to predict and those that are more influenced by social media and financial news. We compare results of different algorithms to find a consistent classifier. Finally, for achieving maximum prediction accuracy, deep learning is used and some classifiers are ensembled. Our experimental results show that highest prediction accuracies of 80.53% and 75.16% are achieved using social media and financial news, respectively. We also show that New York and Red Hat stock markets are hard to predict, New York and IBM stocks are more influenced by social media, while London and Microsoft stocks by financial news. Random forest classifier is found to be consistent and highest accuracy of 83.22% is achieved by its ensemble.

This is a preview of subscription content, access via your institution.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16
Fig. 17
Fig. 18
Fig. 19
Fig. 20

Notes

  1. 1.

    https://www.businessinsider.com.

  2. 2.

    http://www.finance.yahoo.com.

  3. 3.

    https://www.stocktwits.com.

  4. 4.

    https://www.weibo.com.

  5. 5.

    https://www.reuters.com.

  6. 6.

    https://www.bloomberg.com.

  7. 7.

    https://github.com/Jefferson-Henrique/GetOldTweets-python.

  8. 8.

    http://www.finet.hk/mainsite/index.htm.

  9. 9.

    https://jsoup.org.

References

  1. Afzal H, Mehmood K (2016) Spam filtering of bi-lingual tweets using machine learning. In: IEEE 18th international conference on ICACT, pp 710–714

  2. Alostad H, Davulcu H (2015) Directional prediction of stock prices using breaking news on Twitter. In: IEEE/WIC/ACM international conference on WI-IAT 1, pp 523–530

  3. Al-Zoubi A, Faris H (2017) Spam profile detection in social networks based on public features. In: IEEE 8th international conference ICICS, pp 130–135

  4. Attigeri GV, MM MP, Pai RM, Nayak A (2015) Stock market prediction: a big data approach. In: IEEE region 10 conference on TENCON, pp 1–5

  5. Bastianin A, Manera M (2018) How does stock market volatility react to oil price shocks? Mach Dyn 22(3):666–682

    Article  Google Scholar 

  6. Blum C, Li X (2008) Swarm intelligence in optimization. In: Dorigo M (ed) Swarm intelligence. Springer, Berlin, pp 43–85

    Chapter  Google Scholar 

  7. Brezočnik L, Fister I, Podgorelec V (2018) Swarm intelligence algorithms for feature selection: a review. Appl Sci 8(9):1521

    Article  Google Scholar 

  8. Brown GW, Cliff MT (2004) Investor sentiment and the near-term stock market. J Empir Financ 11(1):1–27

    Article  Google Scholar 

  9. Cao J, Cui H, Shi H, Jiao L (2016) Big data: a parallel particle swarm optimization-back-propagation neural network algorithm based on MapReduce. PLoS ONE 11(6):e0157551

    Article  Google Scholar 

  10. Chakraborty P, Pria US, Rony M, Majumdar MA (2017) Predicting stock movement using sentiment analysis of Twitter feed. In: IEEE 6th international conference ICIEV-ISCMHT, pp 1–6

  11. Chen W, Yeo CK, Lau CT, Lee BS (2017a) A study on real-time low-quality content detection on Twitter from the users’ perspective. PLoS ONE 12(8):e0182487

    Article  Google Scholar 

  12. Chen W, Zhang Y, Yeo CK, Lau CT, Lee BS (2017b) Stock market prediction using neural network through news on online social networks. In: IEEE international ISC2, pp 1–6

  13. Chen L, Qiao Z, Wang M, Wang C, Du R, Stanley HE (2018) Which artificial intelligence algorithm better predicts the Chinese stock market? IEEE Access 6:48625–48633

    Article  Google Scholar 

  14. Cheng S, Shi Y, Qin Q, Bai R (2013) Swarm intelligence in big data analytics. In: International conference on intelligent data engineering and automated learning. Springer, Berlin, pp 417–426

    Chapter  Google Scholar 

  15. Chhikara RR, Sharma P, Singh L (2018) An improved dynamic discrete firefly algorithm for blind image steganalysis. Int J Mach Learn Cybern 9(5):821–835

    Article  Google Scholar 

  16. Chou JS, Lin C (2012) Predicting disputes in public-private partnership projects: classification and ensemble models. J Comput Civ Eng 27(1):51–60

    Article  Google Scholar 

  17. Dang M, Duong D (2016) Improvement methods for stock market prediction using financial news articles. In: IEEE 3rd national foundation for science and technology development conference on information and computer science (NICS), pp 125–129

  18. Dang LM, Sadeghi-Niaraki A, Huynh HD, Min K, Moon H (2018) Deep learning approach for short-term stock trends prediction based on two-stream gated recurrent unit network. IEEE Access 6:55392–55404

    Article  Google Scholar 

  19. Dorigo M (1992) Learning and natural algorithms. Ph.D. Thesis, Politecnico di Milano, Milano, Italy

  20. Džeroski S, Ženko B (2004) Is combining classifiers with stacking better than selecting the best one? J Mach Learn 54(3):255–273

    MATH  Article  Google Scholar 

  21. Eberhart R, Kennedy J (1995) Particle swarm optimization. In: Proceedings of the IEEE international conference on neural networks, pp 1942–1948

  22. Enache AC, Sgarciu V, Petrescu-Niţă A (2015) Intelligent feature selection method rooted in Binary Bat Algorithm for intrusion detection. In: 2015 IEEE 10th Jubilee international symposium on applied computational intelligence and informatics. IEEE, pp 517–521

  23. Gidofalvi G, Elkan C (2001) Using news articles to predict stock price movements. Department of Computer Science and Engineering, University of California, San Diego

    Google Scholar 

  24. Hajdu A, Hajdu L, Jonas A, Kovacs L, Toman H (2013) Generalizing the majority voting scheme to spatially constrained voting. IEEE Trans Image Proc 22(11):4182–4194

    MathSciNet  MATH  Article  Google Scholar 

  25. Hassanien AE, Emary E (2016) Swarm intelligence: principles, advances, and applications. CRC Press, Boca Raton

    Book  Google Scholar 

  26. Hastie T, Tibshirani R, Friedman J (2009) The elements of statistical learning: data mining, inference, and prediction, 2nd edn. Springer, New York, p 745

    MATH  Book  Google Scholar 

  27. He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Proceedings of IEEE conference on CVPR 2016, pp 770–778

  28. Hegazy O, Soliman OS, Salam MA (2014) A machine learning model for stock market prediction. Int J Comput Sci Telecommun 4(12):16–23

    Google Scholar 

  29. Hentschel M, Alonso O (2014) Follow the money: a study of cashtags on Twitter. First Monday 19(8). https://doi.org/10.5210/fm.v19i8.5385

    Article  Google Scholar 

  30. Hu Z, Chiong R, Pranata I, Susilo W, Bao Y (2016) Identifying malicious web domains using machine learning techniques with online credibility and performance data. In: 2016 IEEE congress on evolutionary computation (CEC). IEEE, pp 5186–5194

  31. Hunter JD (2007) Matplotlib: a 2D graphics environment. Comput Sci Eng 9(3):90–95. https://doi.org/10.1109/MCSE.2007.55

    Article  Google Scholar 

  32. Ibrahim RA, Ewees AA, Oliva D, Elaziz MA, Lu S (2019) Improved salp swarm algorithm based on particle swarm optimization for feature selection. J Ambient Intell Humaniz Comput. https://doi.org/10.1007/s12652-018-1031-9

    Article  Google Scholar 

  33. Jayaraman V, Sultana HP (2019) Artificial gravitational cuckoo search algorithm along with particle bee optimized associative memory neural network for feature selection in heart disease classification. J Ambient Intell Humaniz Comput. https://doi.org/10.1007/s12652-019-01193-6

    Article  Google Scholar 

  34. Jeon S, Hong B, Chang V (2018) Pattern graph tracking-based stock price prediction using big data. J Future Gener Comput Syst. https://doi.org/10.1016/j.future.2017.02.010

    Article  Google Scholar 

  35. Joshi R, Tekchandani R (2016) Comparative analysis of Twitter data using supervised classifiers. In: IEEE international conference ICICT, 3 pp 1–6

  36. Karaboga D (2005) An idea based on honey bee swarm for numerical optimization. Technical report-tr06, Erciyes University, Engineering Faculty, Computer Engineering Department, vol 200, pp 1–10

  37. Khan W, Malik U, Ghazanfar MA, Azam MA, Alyoubi KH, Alfakeeh AS (2019) Predicting stock market trends using machine learning algorithms via public sentiment and political situation analysis. Soft Comput. https://doi.org/10.1007/s00500-019-04347-y

    Article  Google Scholar 

  38. Khare K, Darekar O, Gupta P, Attar VZ (2017) Short term stock price prediction using deep learning. In: 2nd IEEE international conference RTEICT, pp 482–486

  39. Khatri SK, Srivastava A (2016) Using sentimental analysis in prediction of stock market investment. In: IEEE 5th international conference ICRITO, pp 566–569

  40. Kim E, Kim W, Lee Y (2003) Combination of multiple classifiers for the customer’s purchase behavior prediction. J Decis Support Syst 34(2):167–175

    Article  Google Scholar 

  41. Kohavi R (1995) A study of cross-validation and bootstrap for accuracy estimation and model selection. IJCAI 14(2):1137–1145

    Google Scholar 

  42. Kuhn M, Johnson K (2013) Applied predictive modeling. Springer, New York

    MATH  Book  Google Scholar 

  43. Kumar PH, Patil SB (2015) Volatility forecasting using machine learning and time series techniques. IJRCCE 3(9):8284–8292

    Google Scholar 

  44. Lakshmi V, Harika K, Bavishya H, Harsha CS (2017) Sentiment analysis of twitter data. Int Res J Eng Technol 4(2):2224–2227

    Google Scholar 

  45. Li X (2003) A new intelligent optimization-artificial fish swarm algorithm. Ph.D. Thesis, Zhejiang University, Hangzhou, China

  46. Li Q, Wang T, Li P, Liu L, Gong Q, Chen Y (2014a) The effect of news and public mood on stock movements. J Inf Sci 278:826–840. https://doi.org/10.1016/j.ins.2014.03.096

    Article  Google Scholar 

  47. Li X, Huang X, Deng X, Zhu S (2014b) Enhancing quantitative intra-day stock return prediction by integrating both market news and stock prices information. J Neuro Comput 142:228–238

    Google Scholar 

  48. Li X, Xie H, Chen L, Wang J, Deng X (2014c) News impact on stock price return via sentiment analysis. J Knowl-Based Syst 69:14–23. https://doi.org/10.1016/j.knosys.2014.04.022

    Article  Google Scholar 

  49. Li J, Bu H, Wu J (2017) Sentiment-aware stock market prediction: a deep learning method. In: IEEE international conference ICSSSM, pp 1–6

  50. Liu R, Li W, Liu X, Lu X, Li T, Guo Q (2018) An ensemble of classifiers based on positive and unlabeled data in one-class remote sensing classification. IEEE J Sel Top Appl Earth Obs Remote Sens 11(2):572–584

    Article  Google Scholar 

  51. Makrehchi M, Shah S, Liao W (2013) Stock prediction using event-based sentiment analysis. In: IEEE/WIC/ACM international joint conference on WI and IAT,1, pp 337–342

  52. Mirjalili S, Mirjalili SM, Lewis A (2014) Grey wolf optimizer. J Adv Eng Softw 69:46–61

    Article  Google Scholar 

  53. Mirjalili S, Gandomi AH, Mirjalili SZ, Saremi S, Faris H, Mirjalili SM (2017) Salp Swarm Algorithm: a bio-inspired optimizer for engineering design problems. Adv Eng Softw 114:163–191

    Article  Google Scholar 

  54. Mohammadi FG, Abadeh MS (2014) Image steganalysis using a bee colony based feature selection algorithm. J Eng Appl Artif Intell 31:35–43

    Article  Google Scholar 

  55. Moslehi F, Haeri A (2019) A novel hybrid wrapper–filter approach based on genetic algorithm, particle swarm optimization for feature subset selection. J Ambient Intell Humaniz Comput. https://doi.org/10.1007/s12652-019-01364-5

    Article  Google Scholar 

  56. Noda K, Yamaguchi Y, Nakadai K, Okuno HG, Ogata T (2015) Audio-visual speech recognition using deep learning. J Appl Intell 42(4):722–737

    Article  Google Scholar 

  57. Omer NAB, Halim FA (2015) Modelling volatility of Malaysian stock market using garch models. In: IEEE international symposium iSMSC, pp 447–452

  58. Ou P, Wang H (2009) Prediction of stock market index movement by ten data mining techniques. Mod Appl Sci 3(12):28

    MATH  Article  Google Scholar 

  59. Passino KM (2002) Biomimicry of bacterial foraging for distributed optimization and control. IEEE Control Syst Mag 22:52–67

    Article  Google Scholar 

  60. Pedregosa et al (2011) Scikit-learn: machine learning in Python. JMLR 12:2825–2830

    MathSciNet  MATH  Google Scholar 

  61. Qasem M, Thulasiram R, Thulasiram P (2015) Twitter sentiment classification using machine learning techniques for stock markets. In: IEEE international conference on ICACCI, Kochi, India, pp 834–840

  62. Saraç E, Özel SA (2014) An ant colony optimization based feature selection for web page classification. Sci World J 2014:649260. https://doi.org/10.1155/2014/649260

    Article  Google Scholar 

  63. Sattiraju M, Manikantan K, Ramachandran S (2013) Adaptive BPSO based feature selection and skin detection based background removal for enhanced face recognition. In: 2013 4th national conference on computer vision, pattern recognition, image processing and graphics (NCVPRIPG). IEEE, pp 1–4

  64. Sedhai S, Sun A (2015) HSpam14: a collection of 14 million tweets for hashtag-oriented spam research. In: 38th ACM conference on SIGIR, pp 223–232

  65. Sedhai S, Sun A (2018) Semi-supervised spam detection in Twitter stream. IEEE Trans Comput Soc Syst 5(1):169–175

    Article  Google Scholar 

  66. Seth JK, Chandra S (2016) Intrusion detection based on key feature selection using binary GWO. In: 2016 3rd international conference on computing for sustainable global development (INDIACom). IEEE, pp 3735–3740

  67. Shen S, Jiang H, Zhang T (2012) Stock market forecasting using machine learning algorithms. Department of Electrical Engineering, Stanford University, Stanford, pp 1–5

    Google Scholar 

  68. Socher R, Perelygin A, Wu J, Chuang J, Manning CD, Ng A, Potts C (2013) Recursive deep models for semantic compositionality over a sentiment Treebank. In: Proceedings of 2013 conference on empirical methods in natural language processing, pp 1631–1642

  69. Sun J, Li H (2012) Financial distress prediction using support vector machines: ensemble vs. individual. J Appl Soft Comput 12(8):2254–2265

    Article  Google Scholar 

  70. Tayal D, Komaragiri S (2009) Comparative analysis of the impact of blogging and micro-blogging on market performance. Int J Comput Sci Eng 1(3):176–182

    Google Scholar 

  71. Thu HLT, Marrero-Ponce Y, Cansañola-Martin GM, Cardoso GC, Chávez MC, Garcia MM, Morell C, Torrens F, Abad C (2011) A comparative study of nonlinear machine learning for the “in silico” depiction of tyrosinase inhibitory activity from molecular structure. Mol Inform 30(6–7):527–537

    Article  Google Scholar 

  72. Tirea M, Negru V (2015) Text mining news system-quantifying certain phenomena effect on the stock market behavior. In: IEEE 17th international symposium on SYNASC, pp 391–398

  73. Todorovski L, Džeroski S (2003) Combining classifiers with meta decision trees. J Mach Learn 50(3):223–249

    MATH  Article  Google Scholar 

  74. Tsai CF, Lin YC, Yen DC, Chen YM (2011) Predicting stock returns by classifier ensembles. J Appl Soft Comput. https://doi.org/10.1016/j.asoc.2010.10.001

    Article  Google Scholar 

  75. Urolagin S (2017) Text mining of tweet for sentiment classification and association with stock prices. In: IEEE ICCA, pp 384–388

  76. Usmani M, Adil SH, Raza K, Ali SA (2016) Stock market prediction using machine learning techniques. In: IEEE 3rd international conference on ICCOINS, pp 322–327

  77. Vargas MR, dos Anjos CEM, Bichara GLG, Evsukoff AG (2018) Deep learning for stock market prediction using technical indicators and financial news articles. In: IEEE international joint conference IJCNN, pp 1–8

  78. Wang G, Dai D (2013) Network intrusion detection based on the improved artificial fish swarm algorithm. J Comput 8(11):2990–2996

    Google Scholar 

  79. Wang F, Zhao Z, Li X, Yu F, Zhang H (2014) Stock volatility prediction using multi-kernel learning based extreme learning machine. In: IEEE joint conference IJCNN, pp 3078–3085

  80. Wang H, Jing X, Niu B (2016) Bacterial-inspired feature selection algorithm and its application in fault diagnosis of complex structures. In: 2016 IEEE congress on evolutionary computation (CEC). IEEE, pp 3809–3816

  81. Yan D, Zhou G, Zhao X, Tian Y, Yang F (2016) Predicting stock using microblog moods. J China Commun 13(8):244–257

    Article  Google Scholar 

  82. Yang X-S (2008) Firefly algorithm. In: Nature-inspired metaheuristic algorithms. Luniver Press, Beckington, pp 128

  83. Yang XS (2010) A new metaheuristic bat-inspired algorithm. In Nature inspired cooperative strategies for optimization (NICSO 2010) Springer, Berlin, pp 65–74

    Chapter  Google Scholar 

  84. Yang XS, Deb S (2009) Cuckoo search via Lévy flights. In: 2009 world congress on nature & biologically inspired computing (NaBIC). IEEE, pp 210–214

  85. Yetis Y, Kaplan H, Jamshidi M (2014) Stock market prediction by using artificial neural network. In: IEEE WAC, pp 718–722

  86. Yuan B (2016) Sentiment analysis of Twitter data. M.S. thesis, Department of Computer Science, Rensselaer Polytechnic Institute, New York

  87. Zhong X, Enke D (2016) Forecasting daily stock market return using dimensionality reduction. Exp Syst Appl 67:126–139. https://doi.org/10.1016/j.eswa.2016.09.027

    Article  Google Scholar 

  88. Zhou Z, Zhao J, Xu K (2016) Can online emotions predict the stock market in China? In: international conference on web information systems engineering, pp 328–342

Download references

Author information

Affiliations

Authors

Corresponding author

Correspondence to Mustansar Ali Ghazanfar.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and Permissions

About this article

Verify currency and authenticity via CrossMark

Cite this article

Khan, W., Ghazanfar, M.A., Azam, M.A. et al. Stock market prediction using machine learning classifiers and social media, news. J Ambient Intell Human Comput (2020). https://doi.org/10.1007/s12652-020-01839-w

Download citation

Keywords

  • Deep learning
  • Feature selection
  • Hybrid algorithm
  • Natural language processing
  • Predictive modeling
  • Sentiment analysis
  • Stock market prediction