Stock market prediction with time series data and news headlines: a stacking ensemble approach

Corizzo, Roberto; Rosen, Jacob

doi:10.1007/s10844-023-00804-1

Stock market prediction with time series data and news headlines: a stacking ensemble approach

Research
Published: 23 July 2023

Volume 62, pages 27–56, (2024)
Cite this article

Journal of Intelligent Information Systems Aims and scope Submit manuscript

884 Accesses
3 Citations
1 Altmetric
Explore all metrics

Abstract

Time series forecasting models are gaining traction in many real-world domains as valuable decision support tools. Stock market analysis is a challenging domain, characterized by a complex multi-variate and time-evolving nature, with high volatility, and multiple correlations with exogenous factors. Autoregressive, machine learning, and deep learning models for temporal data have been adopted thus far to solve this task. However, they are usually limited to the analysis of a single data source or modality, and do not collectively deal with all the inherent challenges and complexities presented by stock market data. In this paper, inspired by the promising learning capabilities of hybrid ensemble methods, we propose a novel stacking ensemble approach for stock market prediction that jointly considers news headlines, multi-variate time series data, and multiple base models as predictors. By taking multiple factors into consideration, our model is able to learn historical patterns leveraging multiple data sources and models. Our experiments showcase the ability of our model to outperform popular baselines on next-day stock market trend prediction. A portfolio analysis reveals that our method is also able to yield potential gains or capital preservation capabilities when its predictions are exploited for trading decisions.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

A systematic review of fundamental and technical analysis of stock market predictions

Article 20 August 2019

A brief review of portfolio optimization techniques

Article 15 September 2022

Deep learning for time series classification: a review

Article 02 March 2019

Availability of data and materials

All data sources used for our experiments are public and disclosed in the paper.

Notes

https://pypi.org/project/yahooquery/
https://pypi.org/project/TA-Lib/
https://eodhistoricaldata.com/
The dataset used in our work and the implementation of the proposed approach are publicly available at: https://github.com/rcorizzo/stock-market-stacking/
https://pypi.org/project/pmdarima/
https://scikit-learn.org/stable/modules/generated/sklearn.ensemble.GradientBoostingClassifier.html
https://scikit-learn.org/stable/modules/generated/sklearn.ensemble.AdaBoostClassifier.html
All results are conveniently accessible at: http://www.rcorizzo.com/stock-market/stacking/
Additional high-resolution portfolio analysis plots considering other stacking models are conveniently accessible at: http://www.rcorizzo.com/stock-market/stacking/

References

Akter, M. S., Shahriar, H., Chowdhury, R., & et al. (2022). Forecasting the risk factor of frontier markets: A novel stacking ensemble of neural network approach. Future Internet, 14(9), 252. https://doi.org/10.3390/fi14090252
Al-Shiab, M. (2006). The predictability of the amman stock exchange using the univariate autoregressive integrated moving average (arima) model. Journal of Economic and Administrative Sciences, 22(2), 17–35. https://doi.org/10.1108/10264116200600006
Article Google Scholar
Althelaya, K.A., El-Alfy, E.S.M., & Mohammed, S. (2018). Evaluation of bidirectional lstm for short-and long-term stock market prediction. In: 2018 9th International Conference on Information and Communication Systems (ICICS) 151–156. IEEE. https://doi.org/10.1109/IACS.2018.8355458
Ariyo, A.A., Adewumi, A.O., & Ayo, C.K. (2014). Stock price prediction using the arima model. In: 2014 UKSim-AMSS 16th International Conference on Computer Modelling and Simulation 106–112. IEEE. https://doi.org/10.1109/UKSim.2014.67
Arsov, M., Zdravevski, E., Lameski, P., & et al. (2021). Multi-horizon air pollution forecasting with deep neural networks. Sensors 21(4). https://doi.org/10.3390/s21041235
Banik, S., Sharma, N., Mangla, M., & et al. (2022). Lstm based decision support system for swing trading in stock market. Knowledge-Based Systems, 239, 107994. https://doi.org/10.1016/j.knosys.2021.107994
Barbaglia, L., Consoli, S., & Manzan, S. (2021). Exploring the predictive power of news and neural machine learning models for economic forecasting. In: Mining Data for Financial Applications: 5th ECML PKDD Workshop, MIDAS 2020, Ghent, Belgium, September 18, 2020, Revised Selected Papers 5 135–149. Springer https://doi.org/10.1007/978-3-030-66981-2_11
Bengio, Y., Simard, P., & Frasconi, P. (1994). Learning long-term dependencies with gradient descent is difficult. IEEE Transactions on Neural Networks, 5(2), 157–166. https://doi.org/10.1109/72.279181
Article CAS PubMed Google Scholar
Bhandari, H. N., Rimal, B., Pokhrel, N. R., & et al. (2022). Predicting stock market index using lstm. Machine Learning with Applications, 9, 100320. https://doi.org/10.1016/j.mlwa.2022.100320
Borovkova, S., & Tsiamas, I. (2019). An ensemble of lstm neural networks for high-frequency stock market classification. Journal of Forecasting, 38(6), 600–619. https://doi.org/10.1002/for.2585
Article MathSciNet Google Scholar
Breiman, L. (2001). Random forests. Machine Learning, 45, 5–32. https://doi.org/10.1023/A:1010933404324
Article Google Scholar
Ceci, M., Corizzo, R., Malerba, D., & et al. (2019). Spatial autocorrelation and entropy for renewable energy forecasting. Data Mining and Knowledge Discovery, 33(3), 698–729. https://doi.org/10.1007/s10618-018-0605-7
Chen, T. (2014). Introduction to boosted trees. University of Washington Computer Science, 22(115), 14–40.
Google Scholar
Corizzo, R., Ceci, M., Fanaee, -T., H., & et al. (2021). Multi-aspect renewable energy forecasting. Information Sciences, 546, 701–722. https://doi.org/10.1016/j.ins.2020.08.003
Corizzo, R., Yepez-Lopez, R., Gilbert, S., & et al. (2022). Lstm-based pulmonary air leak forecasting for chest tube management. In: 2022 IEEE International Conference on Big Data (Big Data) 5217–5222. IEEE. https://doi.org/10.1109/BigData55660.2022.10020874
Dong, X., Yu, Z., Cao, W., & et al. (2020). A survey on ensemble learning. Frontiers of Computer Science, 14, 241–258. https://doi.org/10.1007/s11704-019-8208-z
Gao, S., Huang, Y., Zhang, S., & et al. (2020). Short-term runoff prediction with gru and lstm networks without requiring time step optimization during sample generation. Journal of Hydrology, 589, 125188. https://doi.org/10.1016/j.jhydrol.2020.125188
Greco, M., Spagnoletta, M., Appice, A., & et al. (2021). Applying machine learning to predict closing prices in stock market: A case study. In: Mining Data for Financial Applications: 5th ECML PKDD Workshop, MIDAS 2020, Ghent, Belgium, September 18, 2020, Revised Selected Papers 5 32–39. Springer. https://doi.org/10.1007/978-3-030-66981-2_3
Hochreiter, S., Bengio, Y., Frasconi, P., & et al. (2001). Gradient flow in recurrent nets: the difficulty of learning long-term dependencies. A Field Guide to Dynamical Recurrent Neural Networks, 237–244. https://doi.org/10.1109/9780470544037.ch14
Kumar, U., & Jain, V. (2010). Arima forecasting of ambient air pollutants (o 3, no, no 2 and co). Stochastic Environmental Research and Risk Assessment, 24(5), 751–760. https://doi.org/10.1007/s00477-009-0361-8
Article Google Scholar
Lana, I., Del Ser, J., Velez, M., & et al. (2018). Road traffic forecasting: Recent advances and new challenges. IEEE Intelligent Transportation Systems Magazine, 10(2), 93–109. https://doi.org/10.1109/MITS.2018.2806634
Laurinec, P., Lóderer, M., Lucká, M., & et al. (2019). Density-based unsupervised ensemble learning methods for time series forecasting of aggregated or clustered electricity consumption. Journal of Intelligent Information Systems, 53, 219–239. https://doi.org/10.1007/s10844-019-00550-3
Li, Y., & Pan, Y. (2022). A novel ensemble deep learning model for stock prediction based on stock prices and news. International Journal of Data Science and Analytics, 1–11. https://doi.org/10.1007/s41060-021-00279-9
Mondal, P., Shit, L., & Goswami, S. (2014). Study of effectiveness of time series modeling (arima) in forecasting stock prices. International Journal of Computer Science, Engineering and Applications, 4(2), 13. https://doi.org/10.5121/ijcsea.2014.4202
Article Google Scholar
Olorunnimbe, K., & Viktor, H. (2023). Deep learning in the stock market-a systematic survey of practice, backtesting, and applications. Artificial Intelligence Review, 56(3), 2057–2109. https://doi.org/10.1007/s10462-022-10226-0
Article PubMed Google Scholar
Pandey, R., & Singh, J. P. (2023). Bert-lstm model for sarcasm detection in code-mixed social media post. Journal of Intelligent Information Systems, 60(1), 235–254. https://doi.org/10.1007/s10844-022-00755-z
Article Google Scholar
Pankratz, A. (2009). Forecasting with univariate Box-Jenkins models: Concepts and cases. John Wiley & Sons. https://doi.org/10.1002/9780470316566
Pasquadibisceglie, V., Appice, A., Castellano, G., & et al. (2023). Darwin: An online deep learning approach to handle concept drifts in predictive process monitoring. Engineering Applications of Artificial Intelligence, 123,. https://doi.org/10.1016/j.engappai.2023.106461
Qian, C., Yu, Y., & Zhou, Z.H. (2015). Pareto ensemble pruning. In: Proceedings of the AAAI Conference on Artificial Intelligence. https://doi.org/10.5555/2888116.2888125
Ramaswamy, S. L., & Chinnappan, J. (2022). Recognet-lstm+ cnn: a hybrid network with attention mechanism for aspect categorization and sentiment classification. Journal of Intelligent Information Systems, 58(2), 379–404. https://doi.org/10.1007/s10844-021-00692-3
Article Google Scholar
Rocha, C. N., & Rodrigues, F. (2021). Forecasting emergency department admissions. Journal of Intelligent Information Systems, 56(3), 509–528. https://doi.org/10.1007/s10844-021-00638-9
Article Google Scholar
Salisu, A. A., Gupta, R., & Ogbonna, A. E. (2022). A moving average heterogeneous autoregressive model for forecasting the realized volatility of the us stock market: Evidence from over a century of data. International Journal of Finance & Economics, 27(1), 384–400. https://doi.org/10.1002/ijfe.2158
Article Google Scholar
Sesmero, M. P., Ledezma, A. I., & Sanchis, A. (2015). Generating ensembles of heterogeneous classifiers using stacked generalization. Wiley Interdisciplinary Reviews: Data Mining and Knowledge Discovery, 5(1), 21–34. https://doi.org/10.1002/widm.1143
Article Google Scholar
Shah, D., Campbell, W., & Zulkernine, F.H. (2018). A comparative study of lstm and dnn for stock market forecasting. In: 2018 IEEE International Conference on Big Data (Big Data) 4148–4155. IEEE. https://doi.org/10.1109/BigData.2018.8622462
Song, H., & Choi, H. (2023). Forecasting stock market indices using the recurrent neural network based hybrid models: Cnn-lstm, gru-cnn, and ensemble models. Applied Sciences, 13(7), 4644. https://doi.org/10.3390/app13074644
Article CAS Google Scholar
Srijiranon, K., Lertratanakham, Y., & Tanantong, T. (2022). A hybrid framework using pca, emd and lstm methods for stock market price prediction with sentiment analysis. Applied Sciences, 12(21), 10823. https://doi.org/10.3390/app122110823
Article CAS Google Scholar
Stock, J. H., & Watson, M. W. (2001). Vector autoregressions. Journal of Economic Perspectives, 15(4), 101–115. https://doi.org/10.1257/jep.15.4.101
Article Google Scholar
Swathi, T., Kasiviswanath, N., & Rao, A. A. (2022). An optimal deep learning-based lstm for stock price prediction using twitter sentiment analysis. Applied Intelligence, 52(12), 13675–13688. https://doi.org/10.1007/s10489-022-03175-2
Article Google Scholar
Tan, Z., Yan, Z., & Zhu, G. (2019). Stock selection with random forest: An exploitation of excess return in the chinese stock market. Heliyon, 5(8), e02310. https://doi.org/10.1016/j.heliyon.2019.e02310
Article PubMed Central PubMed Google Scholar
Taylor, S. J., & Letham, B. (2018). Forecasting at scale. The American Statistician, 72(1), 37–45. https://doi.org/10.1080/00031305.2017.1380080
Article MathSciNet Google Scholar
Wang, J., Cui, Q., Sun, X., & et al. (2022). Asian stock markets closing index forecast based on secondary decomposition, multi-factor analysis and attention-based lstm model. Engineering Applications of Artificial Intelligence, 113, 104908. https://doi.org/10.1016/j.engappai.2022.104908
Wang, S., Zhao, J., Shao, C., & et al. (2020). Truck traffic flow prediction based on lstm and gru methods with sampled gps data. IEEE Access, 8, 208158–208169. https://doi.org/10.1109/ACCESS.2020.3038788
Weng, B., Lu, L., Wang, X., & et al. (2018). Predicting short-term stock prices using ensemble methods and online data sources. Expert Systems with Applications, 112, 258–273. https://doi.org/10.1016/j.eswa.2018.06.016
Wyner, A. J., Olson, M., Bleich, J., & et al. (2017). Explaining the success of adaboost and random forests as interpolating classifiers. The Journal of Machine Learning Research, 18(1), 1558–1590. https://doi.org/10.5555/3122009.3153004
Yu, Y., Si, X., Hu, C., & et al. (2019). A review of recurrent neural networks: Lstm cells and network architectures. Neural Computation, 31(7), 1235–1270. https://doi.org/10.1162/neco_a_01199
Zhang, D., Chen, S., Zhou, Z.H., & et al. (2008). Constraint projections for ensemble learning. In: AAAI Conference on Artificial Intelligence 758–763
Zhang, Y., Shirakawa, M., & Hara, T. (2022). Predicting temporary deal success with social media timing signals. Journal of Intelligent Information Systems, 1–19. https://doi.org/10.1007/s10844-021-00681-6
Zhao, Y., Ye, L., Pinson, P., & et al. (2018). Correlation-constrained and sparsity-controlled vector autoregressive model for spatio-temporal wind power forecasting. IEEE Transactions on Power Systems, 33(5), 5029–5040. https://doi.org/10.1109/TPWRS.2018.2794450

Download references

Acknowledgements

We acknowledge the support of NVIDIA through a donation of a Titan V GPU.

Funding

Not applicable

Author information

Authors and Affiliations

Department of Computer Science, American University, 4400 Massachusetts Avenue NW, Washington, 20016, DC, United States
Roberto Corizzo & Jacob Rosen

Authors

Roberto Corizzo
View author publications
You can also search for this author in PubMed Google Scholar
Jacob Rosen
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

Roberto Corizzo: Conceptualization, Methodology, Investigation, Software, Writing, Resources, Supervision – Jacob Rosen: Data Curation, Software, Visualization, Writing (Review & Editing)

Corresponding author

Correspondence to Roberto Corizzo.

Ethics declarations

Conflict of interest/Competing interests

The authors declare that there are no financial or non-financial interests directly or indirectly related to the work submitted for publication.

Ethics approval

Not applicable

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Corizzo, R., Rosen, J. Stock market prediction with time series data and news headlines: a stacking ensemble approach. J Intell Inf Syst 62, 27–56 (2024). https://doi.org/10.1007/s10844-023-00804-1

Download citation

Received: 07 May 2023
Revised: 08 July 2023
Accepted: 10 July 2023
Published: 23 July 2023
Issue Date: February 2024
DOI: https://doi.org/10.1007/s10844-023-00804-1

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Stock market prediction with time series data and news headlines: a stacking ensemble approach

Abstract

Access this article

Similar content being viewed by others

A systematic review of fundamental and technical analysis of stock market predictions

A brief review of portfolio optimization techniques

Deep learning for time series classification: a review

Availability of data and materials

Notes

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of interest/Competing interests

Ethics approval

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Stock market prediction with time series data and news headlines: a stacking ensemble approach

Abstract

Access this article

Similar content being viewed by others

A systematic review of fundamental and technical analysis of stock market predictions

A brief review of portfolio optimization techniques

Deep learning for time series classification: a review

Availability of data and materials

Notes

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of interest/Competing interests

Ethics approval

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation