Skip to main content
Log in

Dynamic and context-dependent stock price prediction using attention modules and news sentiment

  • Original Article
  • Published:
Digital Finance Aims and scope Submit manuscript

Abstract

The growth of machine-readable data in finance, such as alternative data, requires new modeling techniques that can handle non-stationary and non-parametric data. Due to the underlying causal dependence and the size and complexity of the data, we propose a new modeling approach for financial time series data, the \(\alpha _{t}\)-RIM (recurrent independent mechanism). This architecture makes use of key–value attention to integrate top-down and bottom-up information in a context-dependent and dynamic way. To model the data in such a dynamic manner, the \(\alpha _{t}\)-RIM utilizes an exponentially smoothed recurrent neural network, which can model non-stationary times series data, combined with a modular and independent recurrent structure. We apply our approach to the closing prices of three selected stocks of the S &P 500 universe as well as their news sentiment score. The results suggest that the \(\alpha _{t}\)-RIM is capable of reflecting the causal structure between stock prices and news sentiment, as well as the seasonality and trends. Consequently, this modeling approach markedly improves the generalization performance, that is, the prediction of unseen data, and outperforms state-of-the-art networks, such as long–short-term memory models.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8

Similar content being viewed by others

Data availability

The datasets generated during and/or analyzed during the current study are available in this repository: https://github.com/QuantLet/alpha_t-RIM.

References

  • Abadi, M., Barham, P., Chen, J., Chen, Z., Davis, A., Dean, J., . . . & Zheng, X. (2016, November). Tensorflow: A system for large-scale machine learning. 12th USENIX symposium on operating systems design and implementation (OSDI 16) (pp. 265–283). Savannah, GA: USENIX Association. Retrieved from https://www.usenix.org/conference/osdi16/technicalsessions/presentation/abadi

  • Adebiyi, A., Adewumi, A., & Ayo, C. (2014). 03). Comparison of arima and artificial neural networks models for stock price prediction. Journal of Applied Mathematics, 2014, 1–7. https://doi.org/10.1155/2014/614342

    Article  Google Scholar 

  • Bengio, Y. (2017). The consciousness prior. CoRR, abs/1709.08568 . Retrieved from arXiv:1709.08568

  • Carruthers, P. (2006). The architecture of the mind: Massive modularity and the flexibility of thought. Oxford: Oxford University Press UK.

    Book  Google Scholar 

  • Cho, K., Courville, A., & Bengio, Y. (2015). Describing multimedia content using attention-based encoder-decoder networks. IEEE Transactions on Multimedia, 17(11), 1875–1886. https://doi.org/10.1109/TMM.2015.2477044

    Article  Google Scholar 

  • Chung, J., Gulcehre, C., Cho, K., & Bengio, Y. (2014). Empirical evaluation of gated recurrent neural networks on sequence modeling.

  • Dixon, M. (2020). Industrial forecasting with exponentially smoothed recurrent neural networks. arXiv preprint arXiv:2004.04717v2 .

  • Drury, M. (2017). Polynomial, spline, gaussian and binner smoothing are carried out building a regression on custom basis expansions. https://github.com/madrury/basis-expansions/blob/master/examples/comparison-of-smoothing-methods.ipynb. GitHub.

  • Galassi, A., Lippi, M., & Torroni, P. (2020). Attention in natural language processing. IEEE Transactions on Neural Networks and Learning Systems, 1–18. Retrieved from https://doi.org/10.1109/tnnls.2020.3019893

  • Glorot, X., & Bengio, Y. (2010, 13–15 May). Understanding the difficulty of training deep feedforward neural networks. Y.W. Teh & M. Titterington (Eds.), Proceedings of the thirteenth international conference on artificial intelligence and statistics (Vol. 9, pp. 249–256). Chia Laguna Resort, Sardinia, Italy: PMLR. Retrieved from http://proceedings.mlr.press/v9/glorot10a.html

  • Goyal, A., Lamb, A., Hoffmann, J., Sodhani, S., Levine, S., Bengio, Y., & Schölkopf, B. (2020). Recurrent independent mechanisms. arXiv preprint arXiv:1909.10893v6 .

  • Harsh Panday, V. S. P., & Vijayarajan, V. (2020). Stock prediction using sentiment analysis and long short term memory. European Journal of Molecular and Clinical Medicine, 7(2), 5060–5069.

    Google Scholar 

  • Hazimeh, H., Zhao, Z., Chowdhery, A., Sathiamoorthy, M., Chen, Y., Mazumder, R., . . . & Chi, E.H. (2021). Dselect-k: Differentiable selection in the mixture of experts with applications to multi-task learning. CoRR, abs/2106.03760 . Retrieved from arXiv:2106.03760

  • Henaff, M., Szlam, A., & LeCun, Y. (2016). Orthogonal rnns and long-memory tasks. CoRR, abs/1602.06662 . Retrieved from arXiv:1602.06662

  • Henaff, M., Weston, J., Szlam, A., Bordes, A., & LeCun, Y. (2016). Tracking the world state with recurrent entity networks. CoRR, abs/1612.03969 . Retrieved from arXiv:1612.03969

  • Hochreiter, S., & Schmidhuber, J. (1999). Long Short-Term Memory. Neural Computation, 9(8), 1735–1780. https://doi.org/10.1162/neco.1997.9.8.1735

    Article  Google Scholar 

  • Kim, S., & Kang, M. (2019). Financial series prediction using attention lstm.

  • Kim, M. (2015). 10). Cost-sensitive estimation of arma models for financial asset return data. Mathematical Problems in Engineering, 2015, 1–8. https://doi.org/10.1155/2015/232184

    Article  Google Scholar 

  • Kingma, D.P., & Ba, J. (2017). Adam: A method for stochastic optimization.

  • Kipf, T., Fetaya, E., Wang, K.-C., Welling, M., & Zemel, R. (2018). Neural relational inference for interacting systems.

  • Königstein, N. (2021). Dynamic and context-dependent stock price prediction using attention modules and news sentiment. https://github.com/Nicolepcx/alphat-RIM. GitHub.

  • Pearl, J. (2009). Causality: Models, reasoning and inference (2nd ed.). USA: Cambridge University Press.

    Book  Google Scholar 

  • Peters, J., Janzing, D., & Schölkopf, B. (2017). Elements of causal inference: foundations and learning algorithms. Cambridge: The MIT Press.

    Google Scholar 

  • Pourahmadi, M. (2016). Time series modelling with unobserved components, by matteo m. pelagatti. published by crc press, 2015, pages: 257. isbn-13: 978-1-4822-2500-6. matteo pelagatti. Journal of Time Series Analysis, 37(4), 575–576. https://doi.org/10.1111/jtsa.12181

    Article  Google Scholar 

  • Prado, M.L.d. (2018). Advances in financial machine learning. New York: Wiley.

  • Santoro, A., Faulkner, R., Raposo, D., Rae, J., Chrzanowski, M., Weber, T., . . . & Lillicrap, T. (2018). Relational recurrent neural networks.

  • Schoelkopf, B., Janzing, D., Peters, J., Sgouritsa, E., Zhang, K., & Mooij, J. (2012). On causal and anticausal learning.

  • Selvin, S., Vinayakumar, R., Gopalakrishnan, E., Menon, V., & Soman, K. (2017). Stock price prediction using lstm, rnn and cnn-sliding window model. 2017 International Conference on Advances in Computing, Communications and Informatics (ICACCI), 1643-1647.

  • Simon, H.A. (1991). The architecture of complexity. In: Facets of systems science . Boston, US. pp. 457–476

  • Sugiyama, M., & Kawanabe, M. (2012). Machine learning in non-stationary environments: introduction to covariate shift adaptation. Cambridge: MIT.

    Book  Google Scholar 

  • Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., . . . & Polosukhin, I. (2017). Attention is all you need. CoRR, abs/1706.03762 . Retrieved from arXiv:1706.03762

  • Wang, Y., Huang, M., Zhu, X., & Zhao, L. (2016, November). Attentionbased LSTM for aspect-level sentiment classification. Proceedings of the 2016 conference on empirical methods in natural language processing (pp. 606–615). Austin, Texas: Association for Computational Linguistics. Retrieved from https://aclanthology.org/D16-1058 10.18653/v1/D16-1058

  • Zhang, X., Liang, X., Li, A., Zhang, S., Xu, R., & Wu, B. (2019). At-lstm An attention-based lstm model for financial time series prediction. IOP Conference Series Materials Science and Engineering., 569, 052037.

    Article  Google Scholar 

Download references

Funding

No funding was received for conducting this study.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Nicole Königstein.

Ethics declarations

Conflict of interest

The author has no competing interests to declare that are relevant to the content of this article.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

The author would like to thank YUKKA Lab, Berlin, for providing the raw data for this research. Furthermore, the author would like to thank Matthew Dixon and Saeed Amen, who provided significant support to the research with their insights and expertise. Finally, the author would like to thank Jörg Osterrieder for his comments and suggestions on this paper.

Appendix A

Appendix A

1.1 A.1 The \({\alpha _{t}}\) -RIM Hyper-Parameter

Due to the models’ constraints of the hyper-parameters (e.g., the number of RIMs have to be smaller or equal to k modules), the normal cross-validation could not be performed. Therefore, a special function was implemented to generate a list of dictionaries to be fed into the grid search as a parameter grid. The list encompasses the following parameters:

  • Units: 2, 4, 6, 8, 10, 12, 14, 16, 18, 20, 25, 30, 35, 40, 45, 50

  • Number of RIMs: 4, 6, 8, 10, 12, 14

  • K modules: 4, 6, 8, 10, 12, 14

  • Input key size: 4, 6, 8, 10, 12

  • Input value size: 4, 6, 8, 10, 12

  • input query size: 4, 6, 8, 10, 12

  • Input keep probability: 0.6, 0.7, 0.8, 0.9

  • Number of communication heads: 2, 4, 6, 8

  • Communication key size: 4, 6, 8, 10, 12

  • Communication value size: 4, 6, 8, 10, 12

  • Communication query size: 4, 6, 8, 10, 12

  • Communication keep probability: 0.6, 0.7, 0.8, 0.9

1.2 A.2 Complete training results

1.2.1 A.2.1 Evaluation metrics

AMAZON

See Tables 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38.

Table 3 Univariate 5 input lags
Table 4 Bivariate 5 input lags
Table 5 Univariate 10 input lags
Table 6 Bivariate 10 input lags
Table 7 Univariate 21 input lags
Table 8 Bivariate 21 input lags

BROWN FORMAN

Table 9 Univariate 5 input lags
Table 10 Bivariate 5 input lags
Table 11 Univariate 10 input lags
Table 12 Bivariate 10 input lags
Table 13 Univariate 21 input lags
Table 14 Bivariate 21 input lags

THERMO FISCHER

Table 15 Univariate 5 input lags
Table 16 Bivariate 5 input lags
Table 17 Univariate 10 input lags
Table 18 Bivariate 10 input lags
Table 19 Univariate 21 input lags
Table 20 Bivariate 21 input lags

1.2.2 A.2.2 Re-scaled metrics

AMAZON

Table 21 Univariate 5 input lags
Table 22 Bivariate 5 input lags
Table 23 Univariate 10 input lags
Table 24 Bivariate 10 input lags
Table 25 Univariate 21 input lags
Table 26 Bivariate 21 input lags

BROWN FORMAN

Table 27 Univariate 5 input lags
Table 28 Bivariate 5 input lags
Table 29 Univariate 10 input lags
Table 30 Bivariate 10 input lags
Table 31 Univariate 21 input lags
Table 32 Bivariate 21 input lags

THERMO FISCHER

Table 33 Univariate 5 input lags
Table 34 Bivariate 5 input lags
Table 35 Univariate 10 input lags
Table 36 Bivariate 10 input lags
Table 37 Univariate 21 input lags
Table 38 Bivariate 21 input lags

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Königstein, N. Dynamic and context-dependent stock price prediction using attention modules and news sentiment. Digit Finance 5, 449–481 (2023). https://doi.org/10.1007/s42521-023-00089-7

Download citation

  • Received:

  • Accepted:

  • Published:

  • Issue Date:

  • DOI: https://doi.org/10.1007/s42521-023-00089-7

Keywords

Navigation