Abstract
The ability to correctly interpret a prediction model’s output is critically important in many problem spheres. Accurate interpretation generates user trust in the model, provides insight into how a model may be improved, and supports understanding of the process being modeled. Absence of this capability has constrained algorithmic trading from making use of more powerful predictive models, such as XGBoost and Random Forests. Recently, the adaptation of coalitional game theory has led to the development of consistent methods of determining feature importance for these models (SHAP).This study designs and tests a novel method of integrating the capabilities of SHAP into predictive models for algorithmic trading.
Similar content being viewed by others
References
Bach, S. (2015). On pixel-wise explanations for non-linear classifier decisions by layerwise relevance propagation. PLoS ONE, 10(7), e0130140.
Hall, P., & Gill, N. (2018). An introduction to machine learning interpretability. Sebastopol: O’Reilly Media, Inc.
Jansen, S. (2018). Machine learning for algorithmic trading. Birmingham: Packt Publishing Ltd.
Koshiyama, A., Firoozye, N., & Treleaven, P. (2020). Algorithms in future capital markets. http://dx.doi.org/10.2139/ssrn.3527511.
Lipovetsky, S., & Conklin, M. (2001). Analysis of regression in game theory approach. Applied Stochastic Models in Business and Industry, 17(4), 319–330.
Lundberg, S., & Erion, G. (2018). Consistent individualized feature attribution for tree ensembles. arXiv:1802.03888.
Lundberg, S., & Lee, S. (2017). A unified approach to interpreting model predictions. In Advances in neural information processing systems (Vol. 30, pp. 4768–4777). Curran Associates, Inc.
Lundberg, S., Nair, B., Vavilala, M.S., Mayumi, H., Eisses, M., Adams, T., et al. (2017). Explainable machine learning predictions to help anesthesiologists prevent hypoxemia during surgery. bioRxiv, 206540.
Mussard, S., & Terraza, V. (2008). The Shapely decomposition for portfolio risk. Applied Economics Letters, 15(9), 713–715.
Ribeiro, M., Singh, S., & Guestrin, C. (2016). Why should I trust you? Explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining. ACM.
Rida, A. (2019). Machine and deep learning for credit scoring: A compliant approach. Master’s Thesis, School of Operations Research and Industrial Engineering, University of California, Berkeley, CA.
Shrikumar, A. (2016). Not just a black box: Learning important features through propagating activation differences. arXiv preprint arXiv:1605.01713.
Shrikumar, A., Greenside, P., & Kundaje, A. (2017). Learning important features through propagating activation differences. arXiv preprint arXiv:1704.02685.
Author information
Authors and Affiliations
Corresponding author
Additional information
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Appendix
Appendix
Random Forests was selected for this study because it is much easier to tune parameters than with XGBoost. Random Forests contain adjustable hyperparameters whose values can dramatically affect performance. This flexibility contributes to their robustness. Common hyperparameters include the number of trees used in the forest, the maximum depth of each tree, and the minimum number or proportion of samples in a leaf. Random search for the best hyperparameter values avoids the exhaustive enumeration of grid search and replaces it with selecting random subsets of hyperparameter combinations. Random Forests are also harder to overfit than XGBoost.
A Random Forest is an ensemble of unpruned classification or regression trees induced from bootstrap samples of the training data, using random feature selection in the tree induction process. Prediction is made by aggregating (majority vote for classification or averaging for regression) the predictions of the ensemble. Formally an ensemble of \( k \) trees is an estimator comprised of a collection of randomized trees \( \{ h\left( {x, \theta_{k} } \right), k = 1, \ldots , K \) }, where the \( \theta_{k} \) are independent identically distributed random vectors, and \( x \) is an input vector. Letting \( \theta \) represent the generic random vector \( \theta_{k} \) having the same distribution as \( k \), as \( K \) goes to infinity, the mean-squared generalization error of the random forest goes almost surely to that of \( E\left[ {h\left( {x, \theta_{k} } \right)} \right] \), thus mitigating the possibility of overfitting the model.
Random forests increase diversity among the classifiers by altering the feature sets over the different tree induction processes and resampling the data. The procedure to build a forest with K trees is as follows:
The italicized part of the algorithm is where random forests depart from the normal bagging procedure. Specifically, when building a decision tree using traditional bagging, the best feature is selected from a given set of features \( F \) in each node and the set of features does not change over the different runs of the induction procedure. Conversely, with random forests a different random subset of size \( g\left( {\left| F \right|} \right) \) is evaluated at each node (e.g., \( g\left( x \right) = 0.15x \;or\; g\left( x \right) = \surd x, \,etc. \)) with the best feature selected from this subset. This has been shown to increase variability.
Rights and permissions
About this article
Cite this article
Hansen, J.V. Coalition Feature Interpretation and Attribution in Algorithmic Trading Models. Comput Econ 58, 849–866 (2021). https://doi.org/10.1007/s10614-020-10053-x
Accepted:
Published:
Issue Date:
DOI: https://doi.org/10.1007/s10614-020-10053-x