Abstract
We introduce MCCE: \({{{\underline{\varvec{M}}}}}\)onte \({{{\underline{\varvec{C}}}}}\)arlo sampling of valid and realistic \({{{\underline{\varvec{C}}}}}\)ounterfactual \({{{\underline{\varvec{E}}}}}\)xplanations for tabular data, a novel counterfactual explanation method that generates on-manifold, actionable and valid counterfactuals by modeling the joint distribution of the mutable features given the immutable features and the decision. Unlike other on-manifold methods that tend to rely on variational autoencoders and have strict prediction model and data requirements, MCCE handles any type of prediction model and categorical features with more than two levels. MCCE first models the joint distribution of the features and the decision with an autoregressive generative model where the conditionals are estimated using decision trees. Then, it samples a large set of observations from this model, and finally, it removes the samples that do not obey certain criteria. We compare MCCE with a range of state-of-the-art on-manifold counterfactual methods using four well-known data sets and show that MCCE outperforms these methods on all common performance metrics and speed. In particular, including the decision in the modeling process improves the efficiency of the method substantially.
Similar content being viewed by others
Data availability and materials
The Adult, FICO and German Credit Data Sets can be downloaded from https://github.com/riccotti/Scamander/blob/main/dataset and the Give Me Some Credit dataset from https://www.kaggle.com/c/GiveMeSomeCredit/data.
Code availability
The Python code used in this paper is open source, and can be downloaded at https://github.com/NorskRegnesentral/mccepy. A similar R package is available at https://github.com/NorskRegnesentral/mcceR.
Notes
We use ‘counterfactual explanation’ or ‘CE’ to refer to the literature or explanation type and ‘counterfactual’ or ‘example’ to refer to the instance produced.
A decision is derived from a prediction, \(f(\varvec{x})\), using a pre-defined cutoff value or interval c, characterizing the desired decision. For example, if \(f(\varvec{x}) = 0.39\) and \(c = (0.5, 1]\), then since \(f(\varvec{x}) \notin c\), we give instance \(\varvec{x}\) a decision of 0 and say \(\varvec{x}\) has received an undesirable decision.
In Borisov et al. (2023) the FICO dataset is referred to as the HELOC dataset.
References
Antorán J, Bhatt U, Adel T et al (2021) Getting a clue: a method for explaining uncertainty estimates. In: International Conference on Learning Representations
Borisov V, Seffler K, Leemann T et al (2023) Language models are realistic tabular data generators. In: Proceedings of ICLR 2023
Breiman L, Friedman J, Olshen R et al (1984) Classification and regression trees. Chapman and Hall, Boca Raton
Brughmans D, Leyman P, Martens D (2023) NICE: an algorithm for nearest instance counterfactual explanations. Data Mining Knowl Discov pp 1–39
Chen T, Guestrin C (2016) XGBoost: A scalable tree boosting system. In: Proceedings of the 22nd ACM SIGKDD International conference on knowledge discovery and data mining, pp 785–794
Chi CM, Vossler P, Fan Y et al (2022) Asymptotic properties of high-dimensional random forests. Ann Stat 50(6):3415–3438
Dandl S, Molnar C, Binder M et al (2020) Multi-objective counterfactual explanations. In: International conference on parallel problem solving from nature, Springer, pp 448–469
Dhurandhar A, Chen PY, Luss R et al (2018) Explanations based on the missing: towards contrastive explanations with pertinent negatives. In: Proceedings of the 32nd International conference on neural information processing systems, pp 590–601
Downs M, Chu JL, Yacoby Y et al (2020) CRUDS: Counterfactual recourse using disentangled subspaces. In: ICML Workshop on human interpretability in machine learning
Drechsler J, Reiter JP (2011) An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput Stat Data Anal 55(12):3232–3243
Dwork C (2006) Differential privacy. In: 33rd international colloquium automata, languages and programming, ICALP 2006, Venice, Italy, July 10-14, 2006, Proceedings, Part II 33, Springer, pp 1–12
Germain M, Gregor K, Murray I et al (2015) MADE: Masked autoencoder for distribution estimation. In: International conference on machine learning, PMLR, pp 881–889
Goethals S, Sörensen K, Martens D (2022) The privacy issue of counterfactual explanations: explanation linkage attacks. arXiv preprint arXiv:2210.12051
Gomez O, Holter S, Yuan J et al (2020) Vice: Visual counterfactual explanations for machine learning models. In: Proceedings of the 25th International conference on intelligent user interfaces. association for computing machinery, New York, NY, USA, IUI ’20, pp 531–535
Guidotti R (2022) Counterfactual explanations and how to find them: literature review and benchmarking. Data Min Knowl Discov pp 1–55
Géron A (2019) Hands-on machine learning with Scikit-learn, Keras, and tensor flow, 2nd edn. O’Reilly Media Inc, Sebastopol
Hastie T, Tibshirani R, Friedman JH et al (2009) The elements of statistical learning: data mining, inference, and prediction, vol 2. Springer, Cham
Joshi S, Koyejo O, Vijitbenjaronk W et al (2019) Towards realistic individual recourse and actionable explanations in black-box decision making systems. Safe Machine Learning workshop at ICLR
Karimi AH, Barthe G, Balle B et al (2020) Model-agnostic counterfactual explanations for consequential decisions. In: International conference on artificial intelligence and statistics, PMLR, pp 895–905
Karimi AH, Barthe G, Schölkopf B et al (2022) A survey of algorithmic recourse: contrastive explanations and consequential recommendations. ACM Comput Surv 55(5):1–29
Keane MT, Smyth B (2020) Good counterfactuals and where to find them: A case-based technique for generating counterfactuals for explainable AI (XAI). In: 28th International conference case-based reasoning research and development, ICCBR 2020, Salamanca, Spain, June 8–12, 2020, Proceedings 28, Springer, pp 163–178
Laugel T, Lesot MJ, Marsala C et al (2018) Comparison-based inverse classification for interpretability in machine learning. In: International conference on information processing and management of uncertainty in knowledge-based systems, Springer, pp 100–111
Mahiou S, Xu K, Ganev G (2022) DPART: Differentially private autoregressive tabular, a general framework for synthetic data generation. arXiv preprint arXiv:2207.05810
Mirza M, Osindero S (2014) Conditional generative adversarial nets. arXiv preprint arXiv:1411.1784
Mothilal RK, Sharma A, Tan C (2020) Explaining machine learning classifiers through diverse counterfactual explanations. In: Proceedings of the 2020 conference on fairness, accountability, and transparency, pp 607–617
Nowok B, Raab GM, Dibben C et al (2016) SYNTHPOP: Bespoke creation of synthetic data in R. J Stat Softw 74(11):1–26
Pawelczyk M, Broelemann K, Kasneci G (2020) Learning model-agnostic counterfactual explanations for tabular data. Proc Web Conf 2020:3126–3132
Pawelczyk M, Bielawski S, Van den Heuvel J et al (2021) Carla: A python library to benchmark algorithmic recourse and counterfactual explanation algorithms. arXiv preprint arXiv:2108.00783
Pawelczyk M, Lakkaraju H, Neel S (2023) On the privacy risks of algorithmic recourse. In: International conference on artificial intelligence and statistics, PMLR, pp 9680–9696
Poyiadzi R, Sokol K, Santos-Rodriguez R et al (2020) Face: Feasible and actionable counterfactual explanations. In: Proceedings of the AAAI/ACM conference on AI, ethics, and society, pp 344–350
Rasouli P, Chieh Yu I (2022) CARE: Coherent actionable recourse based on sound counterfactual explanations. Int J Data Sci Anal pp 1–26
Reiter JP (2005) Using CART to generate partially synthetic public use microdata. J Offl Stat 21(3):441
Scornet E, Biau G, Vert JP (2015) Consistency of random forests. Ann Stat 43(4):1716–1741
Sklar M (1959) Fonctions de repartition an dimensions et leurs marges. Publ Inst Stat Univ Paris 8:229–231
Stepin I, Alonso JM, Catala A et al (2021) A survey of contrastive and counterfactual explanation generation methods for explainable artificial intelligence. IEEE Access 9:11,974-12,001
Tolomei G, Silvestri F, Haines A et al (2017) Interpretable predictions of tree-based ensembles via actionable feature tweaking. In: Proceedings of the 23rd ACM SIGKDD international conference on knowledge discovery and data mining. Association for Computing Machinery, New York, NY, USA, KDD ’17, pp 465–474
Ustun B, Spangher A, Liu Y (2019) Actionable recourse in linear classification. In: Proceedings of the conference on fairness, accountability, and transparency, pp 10–19
Verma S, Dickerson JP, Hines K (2021) Counterfactual explanations for machine learning: challenges revisited. CoRR arXiv:abs/2106.07756
Wachter S, Mittelstadt B, Russell C (2017) Counterfactual explanations without opening the black box: automated decisions and the GDPR. Harv JL Tech 31:841
Wexler J, Pushkarna M, Bolukbasi T et al (2020) The what-if tool: interactive probing of machine learning models. IEEE Trans Vis Comput Graph 26(1):56–65. https://doi.org/10.1109/TVCG.2019.2934619
Wilson DR, Martinez TR (1997) Improved heterogeneous distance functions. J Artif Intell Res 6:1–34
Xu L, Skoularidou M, Cuesta-Infante A et al (2019) Modeling tabular data using conditional GAN. Adv Neural Inf Process Syst 32
Acknowledgements
This work was supported by the Norwegian Research Council grant 237718 (BigInsight). This paper is supported by the European Union’s HORIZON Research and Innovation Programme under grant agreement No 101120657, project ENFIELD (European Lighthouse to Manifest Trustworthy and Green AI).
Author information
Authors and Affiliations
Contributions
AR: Methodology, software, validation, data curation, original draft, writing, review and editing, visualization. MJ: Conceptualization, methodology, software, validation, original draft, writing, review and editing, visualization. KA: Conceptualization, methodology, validation, original draft, writing, review and editing, visualization. AL: Conceptualization, methodology, writing, original draft, review and editing, visualization.
Corresponding author
Ethics declarations
Conflict of interest
The authors declare that they have no conflicts of interest.
Ethical approval
Not applicable.
Consent to participate
Not applicable.
Consent for publication
The authors declare that they provide consent for publication.
Additional information
Responsible editor: Johannes Fürnkranz.
Publisher's Note
Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.
Appendix A Further experiments
Appendix A Further experiments
1.1 A.1 How is MCCE’s performance when the categorical features are not binarized?
Since none of the on-manifold methods investigated in the main paper handle categorical features with more than two levels, we had to restrict the models in the method comparisons to models with only two levels. The MCCE method can, however, handle models with categorical features with an arbitrary number of levels. To exemplify this, we here provide performance results for the MCCE applied to the Adult data set without binarizing the seven categorical features. The results are shown in Table 7.
Since the explanation is carried out on different data/model, the performance scores are not directly comparable. Note, however, that the costs are slightly higher for these counterfactuals, and that the computation time also is higher. Both of these effects are expected since the data are much richer—one of the categorical features (country) has, for instance, 41 different levels. Changes in the categorical features are then more likely. Training the decision trees is also more time-consuming as many more splits need to be considered.
1.2 A.2 How is MCCE’s performance when the prediction model is non-gradient based?
In addition to the restriction to categorical features with two levels, our method comparison required a gradient-based predictive model to be applicable to all the alternative methods. Again, MCCE is not restricted to gradient-based predictive models. Thus, below, we use MCCE to explain a random forest model on the Adult data set, to showcase that it is directly applicable to non gradient-based predictive models. We use a random forest model with 200 trees.
The counterfactual method C-CHVAE is the only other on-manifold method in our list of benchmark methods that supposedly handles non-gradient-based models. Thus, it would have been interesting to compare the performance with this method. Unfortunately, a non-gradient-based version of this method is not currently implemented in CARLA.
The average performance metrics for MCCE are reported in Table 8. As in Sect. A.1, the results are not directly comparable. Note, however, that the cost is quite similar to those for the ANN model. The computation time, on the other hand, is quite a bit higher for the random forest model. This is a result of increased prediction time (used when computing validity) for the random forest implementation compared to the ANN model.
1.3 A.3 Parameter values for competing methods
For all four data sets we used the default parameter values in CARLA for all the competing methods, except for CLUE where we had to use other values for the FICO and German Credit Data sets. The parameter values used for the different methods are shown in Fig. 10. The values in parenthesis are the ones used for the FICO and German Credit Data sets.
1.4 A.4 How do the metrics and computation times compare when we do not condition on the desired decision?
One of the novel contributions of this paper is the modeling of the decision alongside the mutable features, to then condition on the desired decision (and the immutable features) when generating the data set of potential counterfactuals. The idea behind this notion is to bias the method toward generating samples that are more likely to yield the desired decision. In this section, we investigate whether including the decision on the modeling and then conditioning on the desired decision really ensures a higher proportion of valid samples. For the test observations in the Adult data set we generate counterfactuals without conditioning on the decision for various K and show the usual metrics and run times in Table 9, to be compared to Table 4. As suspected, this variation of MCCE is much less effective in generating valid counterfactuals (\(N_\text {CE}\) in Table 9 is much smaller than in Table 4).
Furthermore, the fitting and generation times exhibit a slight decrease compared to the regular MCCE approach. Although one might have expected the training time to decrease due to fewer candidate features per split, the omission of the decision variable can result in larger trees as the available features convey less information. This was precisely the case for the given dataset, also leading to a slight increase in generation time.
On the other hand, the post-processing time is significantly reduced with this alternative approach, as it only generates around 25% unique and valid samples (compared to 86% with the normal MCCE), requiring fewer \(L_0\) and \(L_1\) calculations. In total, the computation time is smaller for the MCCE approach which does not condition on the desired decision.
However, it is essential to highlight that this alternative approach requires a significantly higher value of K in order to generate valid counterfactuals for all 1000 test observations. When not conditioning on the desired decision, a K value of 5, 000 is needed compared to only 50 when conditioning on the desired response. Consequently, this negates the initial speed-up of the former method, making the original MCCE a clearly superior alternative.
1.5 A.5 Quality of generated data
In Sect. 3.3 we showed the histograms for the generated data for four of the variables in the FICO data set. Here, we show the histograms for the rest of the variables. The histograms for the generated data are in white while the histograms for the real data are in dark grey. As we can see, the marginal distributions of the generated data match the original ones very well also for these features (Figs. 11, 12, 13, 14, 15).
1.6 A.6 Simulations mimicking real data experiments
In Sect. 3.4 we performed simulation experiments with a linear model when varying the dimension p (with no fixed features, i.e., \(u=0\)), \(n_{\text {train}}\), \(n_{\text {test}}\) and K. To showcase that these scalability results are relevant and valid in a broader context, we also ran simulations with quantities mimicking the four real data sets in Sect. 3.1. As seen from the table, the total computation times in the simulation (27, 50, 7, and 49 s) are similar in magnitude to those recorded in the real data experiments (25, 32, 6, and 34 s), despite the model, the feature dependence and most likely the tree depth, differing significantly. This indicates that the scalability in our basic simulation study in Sect. 3.4 generalizes roughly to real data settings as well.
Rights and permissions
Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.
About this article
Cite this article
Redelmeier, A., Jullum, M., Aas, K. et al. MCCE: Monte Carlo sampling of valid and realistic counterfactual explanations for tabular data. Data Min Knowl Disc (2024). https://doi.org/10.1007/s10618-024-01017-y
Received:
Accepted:
Published:
DOI: https://doi.org/10.1007/s10618-024-01017-y