Skip to main content
Log in

Stock market prediction using machine learning classifiers and social media, news

  • Original Research
  • Published:
Journal of Ambient Intelligence and Humanized Computing Aims and scope Submit manuscript

Abstract

Accurate stock market prediction is of great interest to investors; however, stock markets are driven by volatile factors such as microblogs and news that make it hard to predict stock market index based on merely the historical data. The enormous stock market volatility emphasizes the need to effectively assess the role of external factors in stock prediction. Stock markets can be predicted using machine learning algorithms on information contained in social media and financial news, as this data can change investors’ behavior. In this paper, we use algorithms on social media and financial news data to discover the impact of this data on stock market prediction accuracy for ten subsequent days. For improving performance and quality of predictions, feature selection and spam tweets reduction are performed on the data sets. Moreover, we perform experiments to find such stock markets that are difficult to predict and those that are more influenced by social media and financial news. We compare results of different algorithms to find a consistent classifier. Finally, for achieving maximum prediction accuracy, deep learning is used and some classifiers are ensembled. Our experimental results show that highest prediction accuracies of 80.53% and 75.16% are achieved using social media and financial news, respectively. We also show that New York and Red Hat stock markets are hard to predict, New York and IBM stocks are more influenced by social media, while London and Microsoft stocks by financial news. Random forest classifier is found to be consistent and highest accuracy of 83.22% is achieved by its ensemble.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9
Fig. 10
Fig. 11
Fig. 12
Fig. 13
Fig. 14
Fig. 15
Fig. 16
Fig. 17
Fig. 18
Fig. 19
Fig. 20

Similar content being viewed by others

Notes

  1. https://www.businessinsider.com.

  2. http://www.finance.yahoo.com.

  3. https://www.stocktwits.com.

  4. https://www.weibo.com.

  5. https://www.reuters.com.

  6. https://www.bloomberg.com.

  7. https://github.com/Jefferson-Henrique/GetOldTweets-python.

  8. http://www.finet.hk/mainsite/index.htm.

  9. https://jsoup.org.

References

  • Afzal H, Mehmood K (2016) Spam filtering of bi-lingual tweets using machine learning. In: IEEE 18th international conference on ICACT, pp 710–714

  • Alostad H, Davulcu H (2015) Directional prediction of stock prices using breaking news on Twitter. In: IEEE/WIC/ACM international conference on WI-IAT 1, pp 523–530

  • Al-Zoubi A, Faris H (2017) Spam profile detection in social networks based on public features. In: IEEE 8th international conference ICICS, pp 130–135

  • Attigeri GV, MM MP, Pai RM, Nayak A (2015) Stock market prediction: a big data approach. In: IEEE region 10 conference on TENCON, pp 1–5

  • Bastianin A, Manera M (2018) How does stock market volatility react to oil price shocks? Mach Dyn 22(3):666–682

    Article  Google Scholar 

  • Blum C, Li X (2008) Swarm intelligence in optimization. In: Dorigo M (ed) Swarm intelligence. Springer, Berlin, pp 43–85

    Chapter  Google Scholar 

  • Brezočnik L, Fister I, Podgorelec V (2018) Swarm intelligence algorithms for feature selection: a review. Appl Sci 8(9):1521

    Article  Google Scholar 

  • Brown GW, Cliff MT (2004) Investor sentiment and the near-term stock market. J Empir Financ 11(1):1–27

    Article  Google Scholar 

  • Cao J, Cui H, Shi H, Jiao L (2016) Big data: a parallel particle swarm optimization-back-propagation neural network algorithm based on MapReduce. PLoS ONE 11(6):e0157551

    Article  Google Scholar 

  • Chakraborty P, Pria US, Rony M, Majumdar MA (2017) Predicting stock movement using sentiment analysis of Twitter feed. In: IEEE 6th international conference ICIEV-ISCMHT, pp 1–6

  • Chen W, Yeo CK, Lau CT, Lee BS (2017a) A study on real-time low-quality content detection on Twitter from the users’ perspective. PLoS ONE 12(8):e0182487

    Article  Google Scholar 

  • Chen W, Zhang Y, Yeo CK, Lau CT, Lee BS (2017b) Stock market prediction using neural network through news on online social networks. In: IEEE international ISC2, pp 1–6

  • Chen L, Qiao Z, Wang M, Wang C, Du R, Stanley HE (2018) Which artificial intelligence algorithm better predicts the Chinese stock market? IEEE Access 6:48625–48633

    Article  Google Scholar 

  • Cheng S, Shi Y, Qin Q, Bai R (2013) Swarm intelligence in big data analytics. In: International conference on intelligent data engineering and automated learning. Springer, Berlin, pp 417–426

  • Chhikara RR, Sharma P, Singh L (2018) An improved dynamic discrete firefly algorithm for blind image steganalysis. Int J Mach Learn Cybern 9(5):821–835

    Article  Google Scholar 

  • Chou JS, Lin C (2012) Predicting disputes in public-private partnership projects: classification and ensemble models. J Comput Civ Eng 27(1):51–60

    Article  Google Scholar 

  • Dang M, Duong D (2016) Improvement methods for stock market prediction using financial news articles. In: IEEE 3rd national foundation for science and technology development conference on information and computer science (NICS), pp 125–129

  • Dang LM, Sadeghi-Niaraki A, Huynh HD, Min K, Moon H (2018) Deep learning approach for short-term stock trends prediction based on two-stream gated recurrent unit network. IEEE Access 6:55392–55404

    Article  Google Scholar 

  • Dorigo M (1992) Learning and natural algorithms. Ph.D. Thesis, Politecnico di Milano, Milano, Italy

  • Džeroski S, Ženko B (2004) Is combining classifiers with stacking better than selecting the best one? J Mach Learn 54(3):255–273

    Article  Google Scholar 

  • Eberhart R, Kennedy J (1995) Particle swarm optimization. In: Proceedings of the IEEE international conference on neural networks, pp 1942–1948

  • Enache AC, Sgarciu V, Petrescu-Niţă A (2015) Intelligent feature selection method rooted in Binary Bat Algorithm for intrusion detection. In: 2015 IEEE 10th Jubilee international symposium on applied computational intelligence and informatics. IEEE, pp 517–521

  • Gidofalvi G, Elkan C (2001) Using news articles to predict stock price movements. Department of Computer Science and Engineering, University of California, San Diego

    Google Scholar 

  • Hajdu A, Hajdu L, Jonas A, Kovacs L, Toman H (2013) Generalizing the majority voting scheme to spatially constrained voting. IEEE Trans Image Proc 22(11):4182–4194

    Article  MathSciNet  Google Scholar 

  • Hassanien AE, Emary E (2016) Swarm intelligence: principles, advances, and applications. CRC Press, Boca Raton

    Google Scholar 

  • Hastie T, Tibshirani R, Friedman J (2009) The elements of statistical learning: data mining, inference, and prediction, 2nd edn. Springer, New York, p 745

    Book  Google Scholar 

  • He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: Proceedings of IEEE conference on CVPR 2016, pp 770–778

  • Hegazy O, Soliman OS, Salam MA (2014) A machine learning model for stock market prediction. Int J Comput Sci Telecommun 4(12):16–23

    Google Scholar 

  • Hentschel M, Alonso O (2014) Follow the money: a study of cashtags on Twitter. First Monday 19(8). https://doi.org/10.5210/fm.v19i8.5385

    Article  Google Scholar 

  • Hu Z, Chiong R, Pranata I, Susilo W, Bao Y (2016) Identifying malicious web domains using machine learning techniques with online credibility and performance data. In: 2016 IEEE congress on evolutionary computation (CEC). IEEE, pp 5186–5194

  • Hunter JD (2007) Matplotlib: a 2D graphics environment. Comput Sci Eng 9(3):90–95. https://doi.org/10.1109/MCSE.2007.55

    Article  Google Scholar 

  • Ibrahim RA, Ewees AA, Oliva D, Elaziz MA, Lu S (2019) Improved salp swarm algorithm based on particle swarm optimization for feature selection. J Ambient Intell Humaniz Comput. https://doi.org/10.1007/s12652-018-1031-9

    Article  Google Scholar 

  • Jayaraman V, Sultana HP (2019) Artificial gravitational cuckoo search algorithm along with particle bee optimized associative memory neural network for feature selection in heart disease classification. J Ambient Intell Humaniz Comput. https://doi.org/10.1007/s12652-019-01193-6

    Article  Google Scholar 

  • Jeon S, Hong B, Chang V (2018) Pattern graph tracking-based stock price prediction using big data. J Future Gener Comput Syst. https://doi.org/10.1016/j.future.2017.02.010

    Article  Google Scholar 

  • Joshi R, Tekchandani R (2016) Comparative analysis of Twitter data using supervised classifiers. In: IEEE international conference ICICT, 3 pp 1–6

  • Karaboga D (2005) An idea based on honey bee swarm for numerical optimization. Technical report-tr06, Erciyes University, Engineering Faculty, Computer Engineering Department, vol 200, pp 1–10

  • Khan W, Malik U, Ghazanfar MA, Azam MA, Alyoubi KH, Alfakeeh AS (2019) Predicting stock market trends using machine learning algorithms via public sentiment and political situation analysis. Soft Comput. https://doi.org/10.1007/s00500-019-04347-y

    Article  Google Scholar 

  • Khare K, Darekar O, Gupta P, Attar VZ (2017) Short term stock price prediction using deep learning. In: 2nd IEEE international conference RTEICT, pp 482–486

  • Khatri SK, Srivastava A (2016) Using sentimental analysis in prediction of stock market investment. In: IEEE 5th international conference ICRITO, pp 566–569

  • Kim E, Kim W, Lee Y (2003) Combination of multiple classifiers for the customer’s purchase behavior prediction. J Decis Support Syst 34(2):167–175

    Article  Google Scholar 

  • Kohavi R (1995) A study of cross-validation and bootstrap for accuracy estimation and model selection. IJCAI 14(2):1137–1145

    Google Scholar 

  • Kuhn M, Johnson K (2013) Applied predictive modeling. Springer, New York

    Book  Google Scholar 

  • Kumar PH, Patil SB (2015) Volatility forecasting using machine learning and time series techniques. IJRCCE 3(9):8284–8292

    Google Scholar 

  • Lakshmi V, Harika K, Bavishya H, Harsha CS (2017) Sentiment analysis of twitter data. Int Res J Eng Technol 4(2):2224–2227

    Google Scholar 

  • Li X (2003) A new intelligent optimization-artificial fish swarm algorithm. Ph.D. Thesis, Zhejiang University, Hangzhou, China

  • Li Q, Wang T, Li P, Liu L, Gong Q, Chen Y (2014a) The effect of news and public mood on stock movements. J Inf Sci 278:826–840. https://doi.org/10.1016/j.ins.2014.03.096

    Article  Google Scholar 

  • Li X, Huang X, Deng X, Zhu S (2014b) Enhancing quantitative intra-day stock return prediction by integrating both market news and stock prices information. J Neuro Comput 142:228–238

    Google Scholar 

  • Li X, Xie H, Chen L, Wang J, Deng X (2014c) News impact on stock price return via sentiment analysis. J Knowl-Based Syst 69:14–23. https://doi.org/10.1016/j.knosys.2014.04.022

    Article  Google Scholar 

  • Li J, Bu H, Wu J (2017) Sentiment-aware stock market prediction: a deep learning method. In: IEEE international conference ICSSSM, pp 1–6

  • Liu R, Li W, Liu X, Lu X, Li T, Guo Q (2018) An ensemble of classifiers based on positive and unlabeled data in one-class remote sensing classification. IEEE J Sel Top Appl Earth Obs Remote Sens 11(2):572–584

    Article  Google Scholar 

  • Makrehchi M, Shah S, Liao W (2013) Stock prediction using event-based sentiment analysis. In: IEEE/WIC/ACM international joint conference on WI and IAT,1, pp 337–342

  • Mirjalili S, Mirjalili SM, Lewis A (2014) Grey wolf optimizer. J Adv Eng Softw 69:46–61

    Article  Google Scholar 

  • Mirjalili S, Gandomi AH, Mirjalili SZ, Saremi S, Faris H, Mirjalili SM (2017) Salp Swarm Algorithm: a bio-inspired optimizer for engineering design problems. Adv Eng Softw 114:163–191

    Article  Google Scholar 

  • Mohammadi FG, Abadeh MS (2014) Image steganalysis using a bee colony based feature selection algorithm. J Eng Appl Artif Intell 31:35–43

    Article  Google Scholar 

  • Moslehi F, Haeri A (2019) A novel hybrid wrapper–filter approach based on genetic algorithm, particle swarm optimization for feature subset selection. J Ambient Intell Humaniz Comput. https://doi.org/10.1007/s12652-019-01364-5

    Article  Google Scholar 

  • Noda K, Yamaguchi Y, Nakadai K, Okuno HG, Ogata T (2015) Audio-visual speech recognition using deep learning. J Appl Intell 42(4):722–737

    Article  Google Scholar 

  • Omer NAB, Halim FA (2015) Modelling volatility of Malaysian stock market using garch models. In: IEEE international symposium iSMSC, pp 447–452

  • Ou P, Wang H (2009) Prediction of stock market index movement by ten data mining techniques. Mod Appl Sci 3(12):28

    Article  Google Scholar 

  • Passino KM (2002) Biomimicry of bacterial foraging for distributed optimization and control. IEEE Control Syst Mag 22:52–67

    Article  Google Scholar 

  • Pedregosa et al (2011) Scikit-learn: machine learning in Python. JMLR 12:2825–2830

    MathSciNet  MATH  Google Scholar 

  • Qasem M, Thulasiram R, Thulasiram P (2015) Twitter sentiment classification using machine learning techniques for stock markets. In: IEEE international conference on ICACCI, Kochi, India, pp 834–840

  • Saraç E, Özel SA (2014) An ant colony optimization based feature selection for web page classification. Sci World J 2014:649260. https://doi.org/10.1155/2014/649260

    Article  Google Scholar 

  • Sattiraju M, Manikantan K, Ramachandran S (2013) Adaptive BPSO based feature selection and skin detection based background removal for enhanced face recognition. In: 2013 4th national conference on computer vision, pattern recognition, image processing and graphics (NCVPRIPG). IEEE, pp 1–4

  • Sedhai S, Sun A (2015) HSpam14: a collection of 14 million tweets for hashtag-oriented spam research. In: 38th ACM conference on SIGIR, pp 223–232

  • Sedhai S, Sun A (2018) Semi-supervised spam detection in Twitter stream. IEEE Trans Comput Soc Syst 5(1):169–175

    Article  Google Scholar 

  • Seth JK, Chandra S (2016) Intrusion detection based on key feature selection using binary GWO. In: 2016 3rd international conference on computing for sustainable global development (INDIACom). IEEE, pp 3735–3740

  • Shen S, Jiang H, Zhang T (2012) Stock market forecasting using machine learning algorithms. Department of Electrical Engineering, Stanford University, Stanford, pp 1–5

    Google Scholar 

  • Socher R, Perelygin A, Wu J, Chuang J, Manning CD, Ng A, Potts C (2013) Recursive deep models for semantic compositionality over a sentiment Treebank. In: Proceedings of 2013 conference on empirical methods in natural language processing, pp 1631–1642

  • Sun J, Li H (2012) Financial distress prediction using support vector machines: ensemble vs. individual. J Appl Soft Comput 12(8):2254–2265

    Article  Google Scholar 

  • Tayal D, Komaragiri S (2009) Comparative analysis of the impact of blogging and micro-blogging on market performance. Int J Comput Sci Eng 1(3):176–182

    Google Scholar 

  • Thu HLT, Marrero-Ponce Y, Cansañola-Martin GM, Cardoso GC, Chávez MC, Garcia MM, Morell C, Torrens F, Abad C (2011) A comparative study of nonlinear machine learning for the “in silico” depiction of tyrosinase inhibitory activity from molecular structure. Mol Inform 30(6–7):527–537

    Article  Google Scholar 

  • Tirea M, Negru V (2015) Text mining news system-quantifying certain phenomena effect on the stock market behavior. In: IEEE 17th international symposium on SYNASC, pp 391–398

  • Todorovski L, Džeroski S (2003) Combining classifiers with meta decision trees. J Mach Learn 50(3):223–249

    Article  Google Scholar 

  • Tsai CF, Lin YC, Yen DC, Chen YM (2011) Predicting stock returns by classifier ensembles. J Appl Soft Comput. https://doi.org/10.1016/j.asoc.2010.10.001

    Article  Google Scholar 

  • Urolagin S (2017) Text mining of tweet for sentiment classification and association with stock prices. In: IEEE ICCA, pp 384–388

  • Usmani M, Adil SH, Raza K, Ali SA (2016) Stock market prediction using machine learning techniques. In: IEEE 3rd international conference on ICCOINS, pp 322–327

  • Vargas MR, dos Anjos CEM, Bichara GLG, Evsukoff AG (2018) Deep learning for stock market prediction using technical indicators and financial news articles. In: IEEE international joint conference IJCNN, pp 1–8

  • Wang G, Dai D (2013) Network intrusion detection based on the improved artificial fish swarm algorithm. J Comput 8(11):2990–2996

    Google Scholar 

  • Wang F, Zhao Z, Li X, Yu F, Zhang H (2014) Stock volatility prediction using multi-kernel learning based extreme learning machine. In: IEEE joint conference IJCNN, pp 3078–3085

  • Wang H, Jing X, Niu B (2016) Bacterial-inspired feature selection algorithm and its application in fault diagnosis of complex structures. In: 2016 IEEE congress on evolutionary computation (CEC). IEEE, pp 3809–3816

  • Yan D, Zhou G, Zhao X, Tian Y, Yang F (2016) Predicting stock using microblog moods. J China Commun 13(8):244–257

    Article  Google Scholar 

  • Yang X-S (2008) Firefly algorithm. In: Nature-inspired metaheuristic algorithms. Luniver Press, Beckington, pp 128

  • Yang XS (2010) A new metaheuristic bat-inspired algorithm. In Nature inspired cooperative strategies for optimization (NICSO 2010) Springer, Berlin, pp 65–74

  • Yang XS, Deb S (2009) Cuckoo search via Lévy flights. In: 2009 world congress on nature & biologically inspired computing (NaBIC). IEEE, pp 210–214

  • Yetis Y, Kaplan H, Jamshidi M (2014) Stock market prediction by using artificial neural network. In: IEEE WAC, pp 718–722

  • Yuan B (2016) Sentiment analysis of Twitter data. M.S. thesis, Department of Computer Science, Rensselaer Polytechnic Institute, New York

  • Zhong X, Enke D (2016) Forecasting daily stock market return using dimensionality reduction. Exp Syst Appl 67:126–139. https://doi.org/10.1016/j.eswa.2016.09.027

    Article  Google Scholar 

  • Zhou Z, Zhao J, Xu K (2016) Can online emotions predict the stock market in China? In: international conference on web information systems engineering, pp 328–342

Download references

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Mustansar Ali Ghazanfar.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Khan, W., Ghazanfar, M.A., Azam, M.A. et al. Stock market prediction using machine learning classifiers and social media, news. J Ambient Intell Human Comput 13, 3433–3456 (2022). https://doi.org/10.1007/s12652-020-01839-w

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s12652-020-01839-w

Keywords

Navigation