Skip to main content
Log in

MCCE: Monte Carlo sampling of valid and realistic counterfactual explanations for tabular data

  • Published:
Data Mining and Knowledge Discovery Aims and scope Submit manuscript

Abstract

We introduce MCCE: \({{{\underline{\varvec{M}}}}}\)onte \({{{\underline{\varvec{C}}}}}\)arlo sampling of valid and realistic \({{{\underline{\varvec{C}}}}}\)ounterfactual \({{{\underline{\varvec{E}}}}}\)xplanations for tabular data, a novel counterfactual explanation method that generates on-manifold, actionable and valid counterfactuals by modeling the joint distribution of the mutable features given the immutable features and the decision. Unlike other on-manifold methods that tend to rely on variational autoencoders and have strict prediction model and data requirements, MCCE handles any type of prediction model and categorical features with more than two levels. MCCE first models the joint distribution of the features and the decision with an autoregressive generative model where the conditionals are estimated using decision trees. Then, it samples a large set of observations from this model, and finally, it removes the samples that do not obey certain criteria. We compare MCCE with a range of state-of-the-art on-manifold counterfactual methods using four well-known data sets and show that MCCE outperforms these methods on all common performance metrics and speed. In particular, including the decision in the modeling process improves the efficiency of the method substantially.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Fig. 1
Algorithm 1
Fig. 2
Fig. 3
Fig. 4
Fig. 5
Fig. 6
Fig. 7
Fig. 8
Fig. 9

Similar content being viewed by others

Data availability and materials

The Adult, FICO and German Credit Data Sets can be downloaded from https://github.com/riccotti/Scamander/blob/main/dataset and the Give Me Some Credit dataset from https://www.kaggle.com/c/GiveMeSomeCredit/data.

Code availability

The Python code used in this paper is open source, and can be downloaded at https://github.com/NorskRegnesentral/mccepy. A similar R package is available at https://github.com/NorskRegnesentral/mcceR.

Notes

  1. We use ‘counterfactual explanation’ or ‘CE’ to refer to the literature or explanation type and ‘counterfactual’ or ‘example’ to refer to the instance produced.

  2. A decision is derived from a prediction, \(f(\varvec{x})\), using a pre-defined cutoff value or interval c, characterizing the desired decision. For example, if \(f(\varvec{x}) = 0.39\) and \(c = (0.5, 1]\), then since \(f(\varvec{x}) \notin c\), we give instance \(\varvec{x}\) a decision of 0 and say \(\varvec{x}\) has received an undesirable decision.

  3. In Borisov et al. (2023) the FICO dataset is referred to as the HELOC dataset.

References

  • Antorán J, Bhatt U, Adel T et al (2021) Getting a clue: a method for explaining uncertainty estimates. In: International Conference on Learning Representations

  • Borisov V, Seffler K, Leemann T et al (2023) Language models are realistic tabular data generators. In: Proceedings of ICLR 2023

  • Breiman L, Friedman J, Olshen R et al (1984) Classification and regression trees. Chapman and Hall, Boca Raton

    Google Scholar 

  • Brughmans D, Leyman P, Martens D (2023) NICE: an algorithm for nearest instance counterfactual explanations. Data Mining Knowl Discov pp 1–39

  • Chen T, Guestrin C (2016) XGBoost: A scalable tree boosting system. In: Proceedings of the 22nd ACM SIGKDD International conference on knowledge discovery and data mining, pp 785–794

  • Chi CM, Vossler P, Fan Y et al (2022) Asymptotic properties of high-dimensional random forests. Ann Stat 50(6):3415–3438

    Article  MathSciNet  Google Scholar 

  • Dandl S, Molnar C, Binder M et al (2020) Multi-objective counterfactual explanations. In: International conference on parallel problem solving from nature, Springer, pp 448–469

  • Dhurandhar A, Chen PY, Luss R et al (2018) Explanations based on the missing: towards contrastive explanations with pertinent negatives. In: Proceedings of the 32nd International conference on neural information processing systems, pp 590–601

  • Downs M, Chu JL, Yacoby Y et al (2020) CRUDS: Counterfactual recourse using disentangled subspaces. In: ICML Workshop on human interpretability in machine learning

  • Drechsler J, Reiter JP (2011) An empirical evaluation of easily implemented, nonparametric methods for generating synthetic datasets. Comput Stat Data Anal 55(12):3232–3243

    Article  MathSciNet  Google Scholar 

  • Dwork C (2006) Differential privacy. In: 33rd international colloquium automata, languages and programming, ICALP 2006, Venice, Italy, July 10-14, 2006, Proceedings, Part II 33, Springer, pp 1–12

  • Germain M, Gregor K, Murray I et al (2015) MADE: Masked autoencoder for distribution estimation. In: International conference on machine learning, PMLR, pp 881–889

  • Goethals S, Sörensen K, Martens D (2022) The privacy issue of counterfactual explanations: explanation linkage attacks. arXiv preprint arXiv:2210.12051

  • Gomez O, Holter S, Yuan J et al (2020) Vice: Visual counterfactual explanations for machine learning models. In: Proceedings of the 25th International conference on intelligent user interfaces. association for computing machinery, New York, NY, USA, IUI ’20, pp 531–535

  • Guidotti R (2022) Counterfactual explanations and how to find them: literature review and benchmarking. Data Min Knowl Discov pp 1–55

  • Géron A (2019) Hands-on machine learning with Scikit-learn, Keras, and tensor flow, 2nd edn. O’Reilly Media Inc, Sebastopol

    Google Scholar 

  • Hastie T, Tibshirani R, Friedman JH et al (2009) The elements of statistical learning: data mining, inference, and prediction, vol 2. Springer, Cham

    Book  Google Scholar 

  • Joshi S, Koyejo O, Vijitbenjaronk W et al (2019) Towards realistic individual recourse and actionable explanations in black-box decision making systems. Safe Machine Learning workshop at ICLR

  • Karimi AH, Barthe G, Balle B et al (2020) Model-agnostic counterfactual explanations for consequential decisions. In: International conference on artificial intelligence and statistics, PMLR, pp 895–905

  • Karimi AH, Barthe G, Schölkopf B et al (2022) A survey of algorithmic recourse: contrastive explanations and consequential recommendations. ACM Comput Surv 55(5):1–29

    Article  Google Scholar 

  • Keane MT, Smyth B (2020) Good counterfactuals and where to find them: A case-based technique for generating counterfactuals for explainable AI (XAI). In: 28th International conference case-based reasoning research and development, ICCBR 2020, Salamanca, Spain, June 8–12, 2020, Proceedings 28, Springer, pp 163–178

  • Laugel T, Lesot MJ, Marsala C et al (2018) Comparison-based inverse classification for interpretability in machine learning. In: International conference on information processing and management of uncertainty in knowledge-based systems, Springer, pp 100–111

  • Mahiou S, Xu K, Ganev G (2022) DPART: Differentially private autoregressive tabular, a general framework for synthetic data generation. arXiv preprint arXiv:2207.05810

  • Mirza M, Osindero S (2014) Conditional generative adversarial nets. arXiv preprint arXiv:1411.1784

  • Mothilal RK, Sharma A, Tan C (2020) Explaining machine learning classifiers through diverse counterfactual explanations. In: Proceedings of the 2020 conference on fairness, accountability, and transparency, pp 607–617

  • Nowok B, Raab GM, Dibben C et al (2016) SYNTHPOP: Bespoke creation of synthetic data in R. J Stat Softw 74(11):1–26

    Article  Google Scholar 

  • Pawelczyk M, Broelemann K, Kasneci G (2020) Learning model-agnostic counterfactual explanations for tabular data. Proc Web Conf 2020:3126–3132

    Google Scholar 

  • Pawelczyk M, Bielawski S, Van den Heuvel J et al (2021) Carla: A python library to benchmark algorithmic recourse and counterfactual explanation algorithms. arXiv preprint arXiv:2108.00783

  • Pawelczyk M, Lakkaraju H, Neel S (2023) On the privacy risks of algorithmic recourse. In: International conference on artificial intelligence and statistics, PMLR, pp 9680–9696

  • Poyiadzi R, Sokol K, Santos-Rodriguez R et al (2020) Face: Feasible and actionable counterfactual explanations. In: Proceedings of the AAAI/ACM conference on AI, ethics, and society, pp 344–350

  • Rasouli P, Chieh Yu I (2022) CARE: Coherent actionable recourse based on sound counterfactual explanations. Int J Data Sci Anal pp 1–26

  • Reiter JP (2005) Using CART to generate partially synthetic public use microdata. J Offl Stat 21(3):441

    Google Scholar 

  • Scornet E, Biau G, Vert JP (2015) Consistency of random forests. Ann Stat 43(4):1716–1741

    Article  MathSciNet  Google Scholar 

  • Sklar M (1959) Fonctions de repartition an dimensions et leurs marges. Publ Inst Stat Univ Paris 8:229–231

    Google Scholar 

  • Stepin I, Alonso JM, Catala A et al (2021) A survey of contrastive and counterfactual explanation generation methods for explainable artificial intelligence. IEEE Access 9:11,974-12,001

    Article  Google Scholar 

  • Tolomei G, Silvestri F, Haines A et al (2017) Interpretable predictions of tree-based ensembles via actionable feature tweaking. In: Proceedings of the 23rd ACM SIGKDD international conference on knowledge discovery and data mining. Association for Computing Machinery, New York, NY, USA, KDD ’17, pp 465–474

  • Ustun B, Spangher A, Liu Y (2019) Actionable recourse in linear classification. In: Proceedings of the conference on fairness, accountability, and transparency, pp 10–19

  • Verma S, Dickerson JP, Hines K (2021) Counterfactual explanations for machine learning: challenges revisited. CoRR arXiv:abs/2106.07756

  • Wachter S, Mittelstadt B, Russell C (2017) Counterfactual explanations without opening the black box: automated decisions and the GDPR. Harv JL Tech 31:841

    Google Scholar 

  • Wexler J, Pushkarna M, Bolukbasi T et al (2020) The what-if tool: interactive probing of machine learning models. IEEE Trans Vis Comput Graph 26(1):56–65. https://doi.org/10.1109/TVCG.2019.2934619

    Article  Google Scholar 

  • Wilson DR, Martinez TR (1997) Improved heterogeneous distance functions. J Artif Intell Res 6:1–34

    Article  MathSciNet  Google Scholar 

  • Xu L, Skoularidou M, Cuesta-Infante A et al (2019) Modeling tabular data using conditional GAN. Adv Neural Inf Process Syst 32

Download references

Acknowledgements

This work was supported by the Norwegian Research Council grant 237718 (BigInsight). This paper is supported by the European Union’s HORIZON Research and Innovation Programme under grant agreement No 101120657, project ENFIELD (European Lighthouse to Manifest Trustworthy and Green AI).

Author information

Authors and Affiliations

Authors

Contributions

AR: Methodology, software, validation, data curation, original draft, writing, review and editing, visualization. MJ: Conceptualization, methodology, software, validation, original draft, writing, review and editing, visualization. KA: Conceptualization, methodology, validation, original draft, writing, review and editing, visualization. AL: Conceptualization, methodology, writing, original draft, review and editing, visualization.

Corresponding author

Correspondence to Martin Jullum.

Ethics declarations

Conflict of interest

The authors declare that they have no conflicts of interest.

Ethical approval

Not applicable.

Consent to participate

Not applicable.

Consent for publication

The authors declare that they provide consent for publication.

Additional information

Responsible editor: Johannes Fürnkranz.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendix A Further experiments

Appendix A Further experiments

1.1 A.1 How is MCCE’s performance when the categorical features are not binarized?

Since none of the on-manifold methods investigated in the main paper handle categorical features with more than two levels, we had to restrict the models in the method comparisons to models with only two levels. The MCCE method can, however, handle models with categorical features with an arbitrary number of levels. To exemplify this, we here provide performance results for the MCCE applied to the Adult data set without binarizing the seven categorical features. The results are shown in Table 7.

Since the explanation is carried out on different data/model, the performance scores are not directly comparable. Note, however, that the costs are slightly higher for these counterfactuals, and that the computation time also is higher. Both of these effects are expected since the data are much richer—one of the categorical features (country) has, for instance, 41 different levels. Changes in the categorical features are then more likely. Training the decision trees is also more time-consuming as many more splits need to be considered.

Table 7 Experiment 5: Average and standard deviation (in parentheses) of performance metrics for counterfactuals generated by MCCE when the categorical features are not binarized

1.2 A.2 How is MCCE’s performance when the prediction model is non-gradient based?

In addition to the restriction to categorical features with two levels, our method comparison required a gradient-based predictive model to be applicable to all the alternative methods. Again, MCCE is not restricted to gradient-based predictive models. Thus, below, we use MCCE to explain a random forest model on the Adult data set, to showcase that it is directly applicable to non gradient-based predictive models. We use a random forest model with 200 trees.

The counterfactual method C-CHVAE is the only other on-manifold method in our list of benchmark methods that supposedly handles non-gradient-based models. Thus, it would have been interesting to compare the performance with this method. Unfortunately, a non-gradient-based version of this method is not currently implemented in CARLA.

The average performance metrics for MCCE are reported in Table 8. As in Sect. A.1, the results are not directly comparable. Note, however, that the cost is quite similar to those for the ANN model. The computation time, on the other hand, is quite a bit higher for the random forest model. This is a result of increased prediction time (used when computing validity) for the random forest implementation compared to the ANN model.

Table 8 Average and standard deviation (in parentheses) of performance metrics for counterfactuals generated by MCCE when the predictive model is a random forest

1.3 A.3 Parameter values for competing methods

For all four data sets we used the default parameter values in CARLA for all the competing methods, except for CLUE where we had to use other values for the FICO and German Credit Data sets. The parameter values used for the different methods are shown in Fig. 10. The values in parenthesis are the ones used for the FICO and German Credit Data sets.

Fig. 10
figure 10

Parameters used for the competing methods

1.4 A.4 How do the metrics and computation times compare when we do not condition on the desired decision?

Table 9 Average and standard deviation (in parentheses) of performance metrics for counterfactuals generated with MCCE when we do not condition on the decision

One of the novel contributions of this paper is the modeling of the decision alongside the mutable features, to then condition on the desired decision (and the immutable features) when generating the data set of potential counterfactuals. The idea behind this notion is to bias the method toward generating samples that are more likely to yield the desired decision. In this section, we investigate whether including the decision on the modeling and then conditioning on the desired decision really ensures a higher proportion of valid samples. For the test observations in the Adult data set we generate counterfactuals without conditioning on the decision for various K and show the usual metrics and run times in Table 9, to be compared to Table 4. As suspected, this variation of MCCE is much less effective in generating valid counterfactuals (\(N_\text {CE}\) in Table 9 is much smaller than in Table 4).

Furthermore, the fitting and generation times exhibit a slight decrease compared to the regular MCCE approach. Although one might have expected the training time to decrease due to fewer candidate features per split, the omission of the decision variable can result in larger trees as the available features convey less information. This was precisely the case for the given dataset, also leading to a slight increase in generation time.

On the other hand, the post-processing time is significantly reduced with this alternative approach, as it only generates around 25% unique and valid samples (compared to 86% with the normal MCCE), requiring fewer \(L_0\) and \(L_1\) calculations. In total, the computation time is smaller for the MCCE approach which does not condition on the desired decision.

However, it is essential to highlight that this alternative approach requires a significantly higher value of K in order to generate valid counterfactuals for all 1000 test observations. When not conditioning on the desired decision, a K value of 5, 000 is needed compared to only 50 when conditioning on the desired response. Consequently, this negates the initial speed-up of the former method, making the original MCCE a clearly superior alternative.

1.5 A.5 Quality of generated data

In Sect. 3.3 we showed the histograms for the generated data for four of the variables in the FICO data set. Here, we show the histograms for the rest of the variables. The histograms for the generated data are in white while the histograms for the real data are in dark grey. As we can see, the marginal distributions of the generated data match the original ones very well also for these features (Figs. 11, 12, 13, 14, 15).

Fig. 11
figure 11

Histograms for four of the variables in the generated data set (white) with the histograms for the real data superimposed (dark grey). Where the histograms overlap, the blend of white and dark grey gives a light grey color

Fig. 12
figure 12

Histograms for four of the variables in the generated data set (white) with the histograms for the real data superimposed (dark grey). Where the histograms overlap, the blend of white and dark grey gives a light grey color

Fig. 13
figure 13

Histograms for four of the variables in the generated data set (white) with the histograms for the real data superimposed (dark grey). Where the histograms overlap, the blend of white and dark grey gives a light grey color

Fig. 14
figure 14

Histograms for four of the variables in the generated data set (white) with the histograms for the real data superimposed (dark grey). Where the histograms overlap, the blend of white and dark grey gives a light grey color

Fig. 15
figure 15

Histograms for three of the variables in the generated data set (white) with the histograms for the real data superimposed (dark grey). Where the histograms overlap, the blend of of white and dark grey gives a light grey color

1.6 A.6 Simulations mimicking real data experiments

In Sect. 3.4 we performed simulation experiments with a linear model when varying the dimension p (with no fixed features, i.e., \(u=0\)), \(n_{\text {train}}\), \(n_{\text {test}}\) and K. To showcase that these scalability results are relevant and valid in a broader context, we also ran simulations with quantities mimicking the four real data sets in Sect. 3.1. As seen from the table, the total computation times in the simulation (27, 50, 7, and 49 s) are similar in magnitude to those recorded in the real data experiments (25, 32, 6, and 34 s), despite the model, the feature dependence and most likely the tree depth, differing significantly. This indicates that the scalability in our basic simulation study in Sect. 3.4 generalizes roughly to real data settings as well.

Table 10 Computation times (in seconds) of the three steps of MCCE when mimicking \(n_{\text {test}}\), \(n_{\text {train}}\), and p/u/q, with \(K=1000\) for the four real data experiments in Sect. 3.1

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Redelmeier, A., Jullum, M., Aas, K. et al. MCCE: Monte Carlo sampling of valid and realistic counterfactual explanations for tabular data. Data Min Knowl Disc (2024). https://doi.org/10.1007/s10618-024-01017-y

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: https://doi.org/10.1007/s10618-024-01017-y

Keywords

Navigation