Abstract
The growth of machine-readable data in finance, such as alternative data, requires new modeling techniques that can handle non-stationary and non-parametric data. Due to the underlying causal dependence and the size and complexity of the data, we propose a new modeling approach for financial time series data, the \(\alpha _{t}\)-RIM (recurrent independent mechanism). This architecture makes use of key–value attention to integrate top-down and bottom-up information in a context-dependent and dynamic way. To model the data in such a dynamic manner, the \(\alpha _{t}\)-RIM utilizes an exponentially smoothed recurrent neural network, which can model non-stationary times series data, combined with a modular and independent recurrent structure. We apply our approach to the closing prices of three selected stocks of the S &P 500 universe as well as their news sentiment score. The results suggest that the \(\alpha _{t}\)-RIM is capable of reflecting the causal structure between stock prices and news sentiment, as well as the seasonality and trends. Consequently, this modeling approach markedly improves the generalization performance, that is, the prediction of unseen data, and outperforms state-of-the-art networks, such as long–short-term memory models.
Similar content being viewed by others
Data availability
The datasets generated during and/or analyzed during the current study are available in this repository: https://github.com/QuantLet/alpha_t-RIM.
References
Abadi, M., Barham, P., Chen, J., Chen, Z., Davis, A., Dean, J., . . . & Zheng, X. (2016, November). Tensorflow: A system for large-scale machine learning. 12th USENIX symposium on operating systems design and implementation (OSDI 16) (pp. 265–283). Savannah, GA: USENIX Association. Retrieved from https://www.usenix.org/conference/osdi16/technicalsessions/presentation/abadi
Adebiyi, A., Adewumi, A., & Ayo, C. (2014). 03). Comparison of arima and artificial neural networks models for stock price prediction. Journal of Applied Mathematics, 2014, 1–7. https://doi.org/10.1155/2014/614342
Bengio, Y. (2017). The consciousness prior. CoRR, abs/1709.08568 . Retrieved from arXiv:1709.08568
Carruthers, P. (2006). The architecture of the mind: Massive modularity and the flexibility of thought. Oxford: Oxford University Press UK.
Cho, K., Courville, A., & Bengio, Y. (2015). Describing multimedia content using attention-based encoder-decoder networks. IEEE Transactions on Multimedia, 17(11), 1875–1886. https://doi.org/10.1109/TMM.2015.2477044
Chung, J., Gulcehre, C., Cho, K., & Bengio, Y. (2014). Empirical evaluation of gated recurrent neural networks on sequence modeling.
Dixon, M. (2020). Industrial forecasting with exponentially smoothed recurrent neural networks. arXiv preprint arXiv:2004.04717v2 .
Drury, M. (2017). Polynomial, spline, gaussian and binner smoothing are carried out building a regression on custom basis expansions. https://github.com/madrury/basis-expansions/blob/master/examples/comparison-of-smoothing-methods.ipynb. GitHub.
Galassi, A., Lippi, M., & Torroni, P. (2020). Attention in natural language processing. IEEE Transactions on Neural Networks and Learning Systems, 1–18. Retrieved from https://doi.org/10.1109/tnnls.2020.3019893
Glorot, X., & Bengio, Y. (2010, 13–15 May). Understanding the difficulty of training deep feedforward neural networks. Y.W. Teh & M. Titterington (Eds.), Proceedings of the thirteenth international conference on artificial intelligence and statistics (Vol. 9, pp. 249–256). Chia Laguna Resort, Sardinia, Italy: PMLR. Retrieved from http://proceedings.mlr.press/v9/glorot10a.html
Goyal, A., Lamb, A., Hoffmann, J., Sodhani, S., Levine, S., Bengio, Y., & Schölkopf, B. (2020). Recurrent independent mechanisms. arXiv preprint arXiv:1909.10893v6 .
Harsh Panday, V. S. P., & Vijayarajan, V. (2020). Stock prediction using sentiment analysis and long short term memory. European Journal of Molecular and Clinical Medicine, 7(2), 5060–5069.
Hazimeh, H., Zhao, Z., Chowdhery, A., Sathiamoorthy, M., Chen, Y., Mazumder, R., . . . & Chi, E.H. (2021). Dselect-k: Differentiable selection in the mixture of experts with applications to multi-task learning. CoRR, abs/2106.03760 . Retrieved from arXiv:2106.03760
Henaff, M., Szlam, A., & LeCun, Y. (2016). Orthogonal rnns and long-memory tasks. CoRR, abs/1602.06662 . Retrieved from arXiv:1602.06662
Henaff, M., Weston, J., Szlam, A., Bordes, A., & LeCun, Y. (2016). Tracking the world state with recurrent entity networks. CoRR, abs/1612.03969 . Retrieved from arXiv:1612.03969
Hochreiter, S., & Schmidhuber, J. (1999). Long Short-Term Memory. Neural Computation, 9(8), 1735–1780. https://doi.org/10.1162/neco.1997.9.8.1735
Kim, S., & Kang, M. (2019). Financial series prediction using attention lstm.
Kim, M. (2015). 10). Cost-sensitive estimation of arma models for financial asset return data. Mathematical Problems in Engineering, 2015, 1–8. https://doi.org/10.1155/2015/232184
Kingma, D.P., & Ba, J. (2017). Adam: A method for stochastic optimization.
Kipf, T., Fetaya, E., Wang, K.-C., Welling, M., & Zemel, R. (2018). Neural relational inference for interacting systems.
Königstein, N. (2021). Dynamic and context-dependent stock price prediction using attention modules and news sentiment. https://github.com/Nicolepcx/alphat-RIM. GitHub.
Pearl, J. (2009). Causality: Models, reasoning and inference (2nd ed.). USA: Cambridge University Press.
Peters, J., Janzing, D., & Schölkopf, B. (2017). Elements of causal inference: foundations and learning algorithms. Cambridge: The MIT Press.
Pourahmadi, M. (2016). Time series modelling with unobserved components, by matteo m. pelagatti. published by crc press, 2015, pages: 257. isbn-13: 978-1-4822-2500-6. matteo pelagatti. Journal of Time Series Analysis, 37(4), 575–576. https://doi.org/10.1111/jtsa.12181
Prado, M.L.d. (2018). Advances in financial machine learning. New York: Wiley.
Santoro, A., Faulkner, R., Raposo, D., Rae, J., Chrzanowski, M., Weber, T., . . . & Lillicrap, T. (2018). Relational recurrent neural networks.
Schoelkopf, B., Janzing, D., Peters, J., Sgouritsa, E., Zhang, K., & Mooij, J. (2012). On causal and anticausal learning.
Selvin, S., Vinayakumar, R., Gopalakrishnan, E., Menon, V., & Soman, K. (2017). Stock price prediction using lstm, rnn and cnn-sliding window model. 2017 International Conference on Advances in Computing, Communications and Informatics (ICACCI), 1643-1647.
Simon, H.A. (1991). The architecture of complexity. In: Facets of systems science . Boston, US. pp. 457–476
Sugiyama, M., & Kawanabe, M. (2012). Machine learning in non-stationary environments: introduction to covariate shift adaptation. Cambridge: MIT.
Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., . . . & Polosukhin, I. (2017). Attention is all you need. CoRR, abs/1706.03762 . Retrieved from arXiv:1706.03762
Wang, Y., Huang, M., Zhu, X., & Zhao, L. (2016, November). Attentionbased LSTM for aspect-level sentiment classification. Proceedings of the 2016 conference on empirical methods in natural language processing (pp. 606–615). Austin, Texas: Association for Computational Linguistics. Retrieved from https://aclanthology.org/D16-1058 10.18653/v1/D16-1058
Zhang, X., Liang, X., Li, A., Zhang, S., Xu, R., & Wu, B. (2019). At-lstm An attention-based lstm model for financial time series prediction. IOP Conference Series Materials Science and Engineering., 569, 052037.
Funding
No funding was received for conducting this study.
Author information
Authors and Affiliations
Corresponding author
Ethics declarations
Conflict of interest
The author has no competing interests to declare that are relevant to the content of this article.
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
The author would like to thank YUKKA Lab, Berlin, for providing the raw data for this research. Furthermore, the author would like to thank Matthew Dixon and Saeed Amen, who provided significant support to the research with their insights and expertise. Finally, the author would like to thank Jörg Osterrieder for his comments and suggestions on this paper.
Appendix A
Appendix A
1.1 A.1 The \({\alpha _{t}}\) -RIM Hyper-Parameter
Due to the models’ constraints of the hyper-parameters (e.g., the number of RIMs have to be smaller or equal to k modules), the normal cross-validation could not be performed. Therefore, a special function was implemented to generate a list of dictionaries to be fed into the grid search as a parameter grid. The list encompasses the following parameters:
-
Units: 2, 4, 6, 8, 10, 12, 14, 16, 18, 20, 25, 30, 35, 40, 45, 50
-
Number of RIMs: 4, 6, 8, 10, 12, 14
-
K modules: 4, 6, 8, 10, 12, 14
-
Input key size: 4, 6, 8, 10, 12
-
Input value size: 4, 6, 8, 10, 12
-
input query size: 4, 6, 8, 10, 12
-
Input keep probability: 0.6, 0.7, 0.8, 0.9
-
Number of communication heads: 2, 4, 6, 8
-
Communication key size: 4, 6, 8, 10, 12
-
Communication value size: 4, 6, 8, 10, 12
-
Communication query size: 4, 6, 8, 10, 12
-
Communication keep probability: 0.6, 0.7, 0.8, 0.9
1.2 A.2 Complete training results
1.2.1 A.2.1 Evaluation metrics
AMAZON
See Tables 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30, 31, 32, 33, 34, 35, 36, 37, 38.
BROWN FORMAN
THERMO FISCHER
1.2.2 A.2.2 Re-scaled metrics
AMAZON
BROWN FORMAN
THERMO FISCHER
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Königstein, N. Dynamic and context-dependent stock price prediction using attention modules and news sentiment. Digit Finance 5, 449–481 (2023). https://doi.org/10.1007/s42521-023-00089-7
Received:
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s42521-023-00089-7