Explainable AI for Financial Forecasting

Carta, Salvatore; Podda, Alessandro Sebastian; Reforgiato Recupero, Diego; Stanciu, Maria Madalina

doi:10.1007/978-3-030-95470-3_5

Part of the book series: Lecture Notes in Computer Science ((LNISA,volume 13164))

Included in the following conference series:

International Conference on Machine Learning, Optimization, and Data Science

2184 Accesses
6 Citations

Abstract

One of the most important steps when employing machine learning approaches is the feature engineering process. It plays a key role in the identification of features that can effectively help modeling the given classification or regression task. This process is usually not trivial and it might lead to the development of handcrafted features. Within the financial domain, this step is even more complex given the general low correlation between features extracted from financial data and their associated labels. This represents indeed a challenging task that it is possible to explore today through the explainable artificial intelligence approaches that have recently appeared in the literature. This paper examines the potential of machine learning automatic feature selection process to support decisions in financial forecasting. Using explainable artificial intelligence methods, we develop different feature selection strategies in an applied financial setting where we want to predict the next-day returns for a set of input stocks. We propose to identify the relevant features for each stock individually; in this way, we take into account the heterogeneous stocks’ behavior. We demonstrate that our approach can separate important features from unimportant ones and bring prediction performance improvements as shown by our performed comparisons between our proposed strategies and several state-of-the-art baselines on real-world financial time series.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 69.99; Price excludes VAT (USA)

Softcover Book: USD 89.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
https://en.wikipedia.org/wiki/S%26P_100.
2.
https://en.wikipedia.org/wiki/CAC_40.
3.
https://en.wikipedia.org/wiki/FTSE_100_Index.
4.
https://en.wikipedia.org/wiki/S%26P_Asia_50.
5.
We use publicly available data from Yahoo Finance.
6.
For example, \(j=1\) denotes the return in the previous day for each observation, whereas \(j=252\) denotes the return in the past year, considering that there are 252 trading days in a year.
7.
https://scikit-learn.org/stable/modules/generated/sklearn.ensemble.RandomForestRegressor.html#sklearn.ensemble.RandomForestRegressor.feature_importances_.
8.
https://github.com/marcotcr/lime/tree/master/lime.

References

Adadi, A., Berrada, M.: Peeking inside the black-box: a survey on explainable artificial intelligence (XAI). IEEE Access 6, 52138–52160 (2018). https://doi.org/10.1109/ACCESS.2018.2870052
Article Google Scholar
Arrieta, A., et al.: Explainable artificial intelligence (XAI): concepts, taxonomies, opportunities and challenges toward responsible AI. Inf. Fusion 58, 82–115 (2020)
Article Google Scholar
Breiman, L.: Random forests. Mach. Learn. 45(1), 5–32 (2001). https://doi.org/10.1023/A:1010933404324
Article MATH Google Scholar
Carta, S.M., Consoli, S., Piras, L., Podda, A.S., Recupero, D.R.: Explainable machine learning exploiting news and domain-specific lexicon for stock market forecasting. IEEE Access 9, 30193–30205 (2021). https://doi.org/10.1109/ACCESS.2021.3059960
Article Google Scholar
Carta, S.M., Consoli, S., Podda, A.S., Recupero, D.R., Stanciu, M.M.: Ensembling and dynamic asset selection for risk-controlled statistical arbitrage. IEEE Access 9, 29942–29959 (2021). https://doi.org/10.1109/ACCESS.2021.3059187
Article Google Scholar
Carta, S., Medda, A., Pili, A., Reforgiato Recupero, D., Saia, R.: Forecasting e-commerce products prices by combining an autoregressive integrated moving average (arima) model and google trends data. Future Internet 11, 5 (2019)
Article Google Scholar
Chen, J., Song, L., Wainwright, M.J., Jordan, M.I.: Learning to explain: an information-theoretic perspective on model interpretation. CoRR abs/1802.07814 (2018). http://arxiv.org/abs/1802.07814
Choi, E., Bahadori, M., Kulas, J., Schuetz, A., Stewart, W., Sun, J.: Retain: an interpretable predictive model for healthcare using reverse time attention mechanism. In: Advances in Neural Information Processing Systems, 30th Annual Conference on Neural Information Processing Systems, NIPS 2016, 05 December 2016 Through 10 December 2016, pp. 3512–3520, January 2016
Google Scholar
Cortez, P., Embrechts, M.J.: Opening black box data mining models using sensitivity analysis. In: 2011 IEEE Symposium on Computational Intelligence and Data Mining (CIDM), pp. 341–348 (2011). https://doi.org/10.1109/CIDM.2011.5949423
Fischer, T., Krauss, C.: Deep learning with long short-term memory networks for financial market predictions. Eur. J. Oper. Res. 270(2), 654–669 (2018)
Article MathSciNet Google Scholar
Fisher, A.J., Rudin, C., Dominici, F.: Model class reliance: variable importance measures for any machine learning model class, from the “rashomon” perspective (2018)
Google Scholar
Fong, R.C., Vedaldi, A.: Interpretable explanations of black boxes by meaningful perturbation. In: 2017 IEEE International Conference on Computer Vision (ICCV), pp. 3449–3457 (2017). https://doi.org/10.1109/ICCV.2017.371
Henrique, B.M., Sobreiro, V.A., Kimura, H.: Literature review: machine learning techniques applied to financial market prediction. Expert Syst. Appl. 124, 226–251 (2019)
Article Google Scholar
Hoeffding, W.: A class of statistics with asymptotically normal distribution. Ann. Math. Stat. 19(3), 293–325 (1948). https://doi.org/10.1214/aoms/1177730196
Article MathSciNet MATH Google Scholar
Huck, N.: Large data sets and machine learning: applications to statistical arbitrage. Eur. J. Oper. Res. 278(1), 330–342 (2019). https://doi.org/10.1016/J.EJOR.2019.04.013
Article MathSciNet MATH Google Scholar
Kingston, J.K.C.: Artificial intelligence and legal liability. In: Bramer, M., Petridis, M. (eds.) Research and Development in Intelligent Systems XXXIII, pp. 269–279. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-47175-4_20
Chapter Google Scholar
Krauss, C., Do, X.A., Huck, N.: Deep neural networks, gradient-boosted trees, random forests: statistical arbitrage on the S&P 500. Eur. J. Oper. Res. 259(2), 689–702 (2017)
Article Google Scholar
Kroll, J., et al.: Accountable algorithms. Univ. Pennsylvania Law Rev. 165, 633–705 (2017)
Google Scholar
Louppe, G., Wehenkel, L., Sutera, A., Geurts, P.: Understanding variable importances in forests of randomized trees. In: Burges, C.J.C., Bottou, L., Welling, M., Ghahramani, Z., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems, vol. 26, pp. 431–439. Curran Associates, Inc. (2013). https://proceedings.neurips.cc/paper/2013/file/e3796ae838835da0b6f6ea37bcf8bcb7-Paper.pdf
Lundberg, S.M., et al.: Explainable machine-learning predictions for the prevention of hypoxaemia during surgery. Nat. Biomed. Eng. 2(10), 749–760 (2018). https://doi.org/10.1038/s41551-018-0304-0
Article Google Scholar
Man, X., Chan, E.P.: The best way to select features? Comparing MDA, LIME, and SHAP. J. Financ. Data Sci. 3(1), 127–139 (2020). https://doi.org/10.3905/jfds.2020.1.047. https://jfds.pm-research.com/content/early/2020/12/04/jfds.2020.1.047
Molnar, C., Casalicchio, G., Bischl, B.: Interpretable machine learning-a brief history, state-of-the-art and challenges (2020)
Google Scholar
Pedregosa, F., et al.: Scikit-learn: machine learning in Python. J. Mach. Learn. Res. 12, 2825–2830 (2011)
MathSciNet MATH Google Scholar
de Prado, M.L.: Advances in Financial Machine Learning, 1st edn. Wiley Publishing, Hoboken (2018)
Google Scholar
Ribeiro, M.T., Singh, S., Guestrin, C.: “Why should i trust you?”: explaining the predictions of any classifier. In: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD 2016, pp. 1135–1144. Association for Computing Machinery, New York (2016). https://doi.org/10.1145/2939672.2939778
Song, H., Rajan, D., Thiagarajan, J., Spanias, A.: Attend and diagnose: clinical time series analysis using attention models. In: 32nd AAAI Conference on Artificial Intelligence, AAAI 2018, 32nd AAAI Conference on Artificial Intelligence, AAAI 2018, pp. 4091–4098. AAAI Press (2018)
Google Scholar
Strobl, C., Boulesteix, A.L., Kneib, T., Augustin, T., Zeileis, A.: Conditional variable importance for random forests. BMC Bioinform. 9(1), 307 (2008). https://doi.org/10.1186/1471-2105-9-307
Article Google Scholar
Strumbelj, E., Kononenko, I.: An efficient explanation of individual classifications using game theory. J. Mach. Learn. Res. 11, 1–18 (2010)
MathSciNet MATH Google Scholar
Suresh, H., Hunt, N., Johnson, A., Celi, L.A., Szolovits, P., Ghassemi, M.: Clinical intervention prediction and understanding with deep neural networks. In: Doshi-Velez, F., Fackler, J., Kale, D., Ranganath, R., Wallace, B., Wiens, J. (eds.) Proceedings of the 2nd Machine Learning for Healthcare Conference. Proceedings of Machine Learning Research, Boston, Massachusetts, 18–19 August 2017, vol. 68, pp. 322–337. PMLR (2017). http://proceedings.mlr.press/v68/suresh17a.html
Tonekaboni, S., Joshi, S., Campbell, K., Duvenaud, D.K., Goldenberg, A.: What went wrong and when? Instance-wise feature importance for time-series black-box models. In: Larochelle, H., Ranzato, M., Hadsell, R., Balcan, M.F., Lin, H. (eds.) Advances in Neural Information Processing Systems, vol. 33, pp. 799–809. Curran Associates, Inc. (2020)
Google Scholar
Yoon, J., Jordon, J., van der Schaar, M.: INVASE: instance-wise variable selection using neural networks. In: International Conference on Learning Representations (2019). https://openreview.net/forum?id=BJg_roAcK7

Download references

Acknowledgements

The research performed in this paper has been supported by the “Bando “Aiuti per progetti di Ricerca e Sviluppo” - POR FESR 2014-2020-Asse 1, Azione 1.1.3, Strategy 2- Program 3, Project AlmostAnOracle - AI and Big Data Algorithms for Financial Time Series Forecasting”.

Author information

Authors and Affiliations

Department of Mathematics and Computer Science, University of Cagliari, Cagliari, Italy
Salvatore Carta, Alessandro Sebastian Podda, Diego Reforgiato Recupero & Maria Madalina Stanciu

Authors

Salvatore Carta
View author publications
You can also search for this author in PubMed Google Scholar
Alessandro Sebastian Podda
View author publications
You can also search for this author in PubMed Google Scholar
Diego Reforgiato Recupero
View author publications
You can also search for this author in PubMed Google Scholar
Maria Madalina Stanciu
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Alessandro Sebastian Podda .

Editor information

Editors and Affiliations

University of Catania, Catania, Italy
Giuseppe Nicosia
Department of Computer Science, University of Reading, Reading, UK
Varun Ojha
Department of Computer Science, University of Oxford, Oxford, UK
Emanuele La Malfa
Cambridge Judge Business School, University of Cambridge, Cambridge, UK
Gabriele La Malfa
Department of Biochemistry, University of Cambridge, Cambridge, UK
Giorgio Jansen
Department of Industrial and Systems Engineering, University of Florida, Gainesville, FL, USA
Panos M. Pardalos
University of Catania, Catania, Italy
Giovanni Giuffrida
Department of Informatics, Dana-Farber Cancer Institute, Boston, MA, USA
Renato Umeton

Rights and permissions

Reprints and permissions

Copyright information

About this paper

Cite this paper

Carta, S., Podda, A.S., Reforgiato Recupero, D., Stanciu, M.M. (2022). Explainable AI for Financial Forecasting. In: Nicosia, G., et al. Machine Learning, Optimization, and Data Science. LOD 2021. Lecture Notes in Computer Science(), vol 13164. Springer, Cham. https://doi.org/10.1007/978-3-030-95470-3_5

Download citation

DOI: https://doi.org/10.1007/978-3-030-95470-3_5
Published: 02 February 2022
Publisher Name: Springer, Cham
Print ISBN: 978-3-030-95469-7
Online ISBN: 978-3-030-95470-3
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Explainable AI for Financial Forecasting