Skip to main content

Explainable AI for Financial Forecasting

  • Conference paper
  • First Online:
Machine Learning, Optimization, and Data Science (LOD 2021)

Abstract

One of the most important steps when employing machine learning approaches is the feature engineering process. It plays a key role in the identification of features that can effectively help modeling the given classification or regression task. This process is usually not trivial and it might lead to the development of handcrafted features. Within the financial domain, this step is even more complex given the general low correlation between features extracted from financial data and their associated labels. This represents indeed a challenging task that it is possible to explore today through the explainable artificial intelligence approaches that have recently appeared in the literature. This paper examines the potential of machine learning automatic feature selection process to support decisions in financial forecasting. Using explainable artificial intelligence methods, we develop different feature selection strategies in an applied financial setting where we want to predict the next-day returns for a set of input stocks. We propose to identify the relevant features for each stock individually; in this way, we take into account the heterogeneous stocks’ behavior. We demonstrate that our approach can separate important features from unimportant ones and bring prediction performance improvements as shown by our performed comparisons between our proposed strategies and several state-of-the-art baselines on real-world financial time series.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Chapter
USD 29.95
Price excludes VAT (USA)
  • Available as PDF
  • Read on any device
  • Instant download
  • Own it forever
eBook
USD 69.99
Price excludes VAT (USA)
  • Available as EPUB and PDF
  • Read on any device
  • Instant download
  • Own it forever
Softcover Book
USD 89.99
Price excludes VAT (USA)
  • Compact, lightweight edition
  • Dispatched in 3 to 5 business days
  • Free shipping worldwide - see info

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

  1. 1.

    https://en.wikipedia.org/wiki/S%26P_100.

  2. 2.

    https://en.wikipedia.org/wiki/CAC_40.

  3. 3.

    https://en.wikipedia.org/wiki/FTSE_100_Index.

  4. 4.

    https://en.wikipedia.org/wiki/S%26P_Asia_50.

  5. 5.

    We use publicly available data from Yahoo Finance.

  6. 6.

    For example, \(j=1\) denotes the return in the previous day for each observation, whereas \(j=252\) denotes the return in the past year, considering that there are 252 trading days in a year.

  7. 7.

    https://scikit-learn.org/stable/modules/generated/sklearn.ensemble.RandomForestRegressor.html#sklearn.ensemble.RandomForestRegressor.feature_importances_.

  8. 8.

    https://github.com/marcotcr/lime/tree/master/lime.

References

  1. Adadi, A., Berrada, M.: Peeking inside the black-box: a survey on explainable artificial intelligence (XAI). IEEE Access 6, 52138–52160 (2018). https://doi.org/10.1109/ACCESS.2018.2870052

    Article  Google Scholar 

  2. Arrieta, A., et al.: Explainable artificial intelligence (XAI): concepts, taxonomies, opportunities and challenges toward responsible AI. Inf. Fusion 58, 82–115 (2020)

    Article  Google Scholar 

  3. Breiman, L.: Random forests. Mach. Learn. 45(1), 5–32 (2001). https://doi.org/10.1023/A:1010933404324

    Article  MATH  Google Scholar 

  4. Carta, S.M., Consoli, S., Piras, L., Podda, A.S., Recupero, D.R.: Explainable machine learning exploiting news and domain-specific lexicon for stock market forecasting. IEEE Access 9, 30193–30205 (2021). https://doi.org/10.1109/ACCESS.2021.3059960

    Article  Google Scholar 

  5. Carta, S.M., Consoli, S., Podda, A.S., Recupero, D.R., Stanciu, M.M.: Ensembling and dynamic asset selection for risk-controlled statistical arbitrage. IEEE Access 9, 29942–29959 (2021). https://doi.org/10.1109/ACCESS.2021.3059187

    Article  Google Scholar 

  6. Carta, S., Medda, A., Pili, A., Reforgiato Recupero, D., Saia, R.: Forecasting e-commerce products prices by combining an autoregressive integrated moving average (arima) model and google trends data. Future Internet 11, 5 (2019)

    Article  Google Scholar 

  7. Chen, J., Song, L., Wainwright, M.J., Jordan, M.I.: Learning to explain: an information-theoretic perspective on model interpretation. CoRR abs/1802.07814 (2018). http://arxiv.org/abs/1802.07814

  8. Choi, E., Bahadori, M., Kulas, J., Schuetz, A., Stewart, W., Sun, J.: Retain: an interpretable predictive model for healthcare using reverse time attention mechanism. In: Advances in Neural Information Processing Systems, 30th Annual Conference on Neural Information Processing Systems, NIPS 2016, 05 December 2016 Through 10 December 2016, pp. 3512–3520, January 2016

    Google Scholar 

  9. Cortez, P., Embrechts, M.J.: Opening black box data mining models using sensitivity analysis. In: 2011 IEEE Symposium on Computational Intelligence and Data Mining (CIDM), pp. 341–348 (2011). https://doi.org/10.1109/CIDM.2011.5949423

  10. Fischer, T., Krauss, C.: Deep learning with long short-term memory networks for financial market predictions. Eur. J. Oper. Res. 270(2), 654–669 (2018)

    Article  MathSciNet  Google Scholar 

  11. Fisher, A.J., Rudin, C., Dominici, F.: Model class reliance: variable importance measures for any machine learning model class, from the “rashomon” perspective (2018)

    Google Scholar 

  12. Fong, R.C., Vedaldi, A.: Interpretable explanations of black boxes by meaningful perturbation. In: 2017 IEEE International Conference on Computer Vision (ICCV), pp. 3449–3457 (2017). https://doi.org/10.1109/ICCV.2017.371

  13. Henrique, B.M., Sobreiro, V.A., Kimura, H.: Literature review: machine learning techniques applied to financial market prediction. Expert Syst. Appl. 124, 226–251 (2019)

    Article  Google Scholar 

  14. Hoeffding, W.: A class of statistics with asymptotically normal distribution. Ann. Math. Stat. 19(3), 293–325 (1948). https://doi.org/10.1214/aoms/1177730196

    Article  MathSciNet  MATH  Google Scholar 

  15. Huck, N.: Large data sets and machine learning: applications to statistical arbitrage. Eur. J. Oper. Res. 278(1), 330–342 (2019). https://doi.org/10.1016/J.EJOR.2019.04.013

    Article  MathSciNet  MATH  Google Scholar 

  16. Kingston, J.K.C.: Artificial intelligence and legal liability. In: Bramer, M., Petridis, M. (eds.) Research and Development in Intelligent Systems XXXIII, pp. 269–279. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-47175-4_20

    Chapter  Google Scholar 

  17. Krauss, C., Do, X.A., Huck, N.: Deep neural networks, gradient-boosted trees, random forests: statistical arbitrage on the S&P 500. Eur. J. Oper. Res. 259(2), 689–702 (2017)

    Article  Google Scholar 

  18. Kroll, J., et al.: Accountable algorithms. Univ. Pennsylvania Law Rev. 165, 633–705 (2017)

    Google Scholar 

  19. Louppe, G., Wehenkel, L., Sutera, A., Geurts, P.: Understanding variable importances in forests of randomized trees. In: Burges, C.J.C., Bottou, L., Welling, M., Ghahramani, Z., Weinberger, K.Q. (eds.) Advances in Neural Information Processing Systems, vol. 26, pp. 431–439. Curran Associates, Inc. (2013). https://proceedings.neurips.cc/paper/2013/file/e3796ae838835da0b6f6ea37bcf8bcb7-Paper.pdf

  20. Lundberg, S.M., et al.: Explainable machine-learning predictions for the prevention of hypoxaemia during surgery. Nat. Biomed. Eng. 2(10), 749–760 (2018). https://doi.org/10.1038/s41551-018-0304-0

    Article  Google Scholar 

  21. Man, X., Chan, E.P.: The best way to select features? Comparing MDA, LIME, and SHAP. J. Financ. Data Sci. 3(1), 127–139 (2020). https://doi.org/10.3905/jfds.2020.1.047. https://jfds.pm-research.com/content/early/2020/12/04/jfds.2020.1.047

  22. Molnar, C., Casalicchio, G., Bischl, B.: Interpretable machine learning-a brief history, state-of-the-art and challenges (2020)

    Google Scholar 

  23. Pedregosa, F., et al.: Scikit-learn: machine learning in Python. J. Mach. Learn. Res. 12, 2825–2830 (2011)

    MathSciNet  MATH  Google Scholar 

  24. de Prado, M.L.: Advances in Financial Machine Learning, 1st edn. Wiley Publishing, Hoboken (2018)

    Google Scholar 

  25. Ribeiro, M.T., Singh, S., Guestrin, C.: “Why should i trust you?”: explaining the predictions of any classifier. In: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD 2016, pp. 1135–1144. Association for Computing Machinery, New York (2016). https://doi.org/10.1145/2939672.2939778

  26. Song, H., Rajan, D., Thiagarajan, J., Spanias, A.: Attend and diagnose: clinical time series analysis using attention models. In: 32nd AAAI Conference on Artificial Intelligence, AAAI 2018, 32nd AAAI Conference on Artificial Intelligence, AAAI 2018, pp. 4091–4098. AAAI Press (2018)

    Google Scholar 

  27. Strobl, C., Boulesteix, A.L., Kneib, T., Augustin, T., Zeileis, A.: Conditional variable importance for random forests. BMC Bioinform. 9(1), 307 (2008). https://doi.org/10.1186/1471-2105-9-307

    Article  Google Scholar 

  28. Strumbelj, E., Kononenko, I.: An efficient explanation of individual classifications using game theory. J. Mach. Learn. Res. 11, 1–18 (2010)

    MathSciNet  MATH  Google Scholar 

  29. Suresh, H., Hunt, N., Johnson, A., Celi, L.A., Szolovits, P., Ghassemi, M.: Clinical intervention prediction and understanding with deep neural networks. In: Doshi-Velez, F., Fackler, J., Kale, D., Ranganath, R., Wallace, B., Wiens, J. (eds.) Proceedings of the 2nd Machine Learning for Healthcare Conference. Proceedings of Machine Learning Research, Boston, Massachusetts, 18–19 August 2017, vol. 68, pp. 322–337. PMLR (2017). http://proceedings.mlr.press/v68/suresh17a.html

  30. Tonekaboni, S., Joshi, S., Campbell, K., Duvenaud, D.K., Goldenberg, A.: What went wrong and when? Instance-wise feature importance for time-series black-box models. In: Larochelle, H., Ranzato, M., Hadsell, R., Balcan, M.F., Lin, H. (eds.) Advances in Neural Information Processing Systems, vol. 33, pp. 799–809. Curran Associates, Inc. (2020)

    Google Scholar 

  31. Yoon, J., Jordon, J., van der Schaar, M.: INVASE: instance-wise variable selection using neural networks. In: International Conference on Learning Representations (2019). https://openreview.net/forum?id=BJg_roAcK7

Download references

Acknowledgements

The research performed in this paper has been supported by the “Bando “Aiuti per progetti di Ricerca e Sviluppo” - POR FESR 2014-2020-Asse 1, Azione 1.1.3, Strategy 2- Program 3, Project AlmostAnOracle - AI and Big Data Algorithms for Financial Time Series Forecasting”.

Author information

Authors and Affiliations

Authors

Corresponding author

Correspondence to Alessandro Sebastian Podda .

Editor information

Editors and Affiliations

Rights and permissions

Reprints and permissions

Copyright information

© 2022 Springer Nature Switzerland AG

About this paper

Check for updates. Verify currency and authenticity via CrossMark

Cite this paper

Carta, S., Podda, A.S., Reforgiato Recupero, D., Stanciu, M.M. (2022). Explainable AI for Financial Forecasting. In: Nicosia, G., et al. Machine Learning, Optimization, and Data Science. LOD 2021. Lecture Notes in Computer Science(), vol 13164. Springer, Cham. https://doi.org/10.1007/978-3-030-95470-3_5

Download citation

  • DOI: https://doi.org/10.1007/978-3-030-95470-3_5

  • Published:

  • Publisher Name: Springer, Cham

  • Print ISBN: 978-3-030-95469-7

  • Online ISBN: 978-3-030-95470-3

  • eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics