Causal Inference in Data Analysis with Applications to Fairness and Explanations

Roy, Sudeepa; Salimi, Babak

doi:10.1007/978-3-031-31414-8_3

Sudeepa Roy⁹ &
Babak Salimi¹⁰

Part of the book series: Lecture Notes in Computer Science ((LNCS,volume 13759))

542 Accesses
1 Citations

Abstract

Causal inference is a fundamental concept that goes beyond simple correlation and model-based prediction analysis, and is highly relevant in domains such as health, medicine, and the social sciences. Causal inference enables the estimation of the impact of an intervention or treatment on the world, making it critical for sound and robust policy making. However, randomized controlled experiments, which are typically considered as the gold standard for inferring causal conclusions, are often not feasible due to ethical, cost, or other constraints. Fortunately, there is a rich literature in Artificial Intelligence (AI), Machine Learning (ML), and Statistics on observational studies, which are methods for causal inference on observed or collected data under certain assumptions. In this paper, we provide an overview of popular formal and rigorous techniques for causal inference on observed data from the AI and Statistics literature. Furthermore, we discuss how concepts from causal inference can be used to infer fairness and enable explainability in machine learning models, which are critical in responsible data science when ML is used in making high-stake decisions in various contexts. Our discussion highlights the importance of using causal inference in ML models and provides insights on how to develop more transparent and responsible AI systems.

This is a preview of subscription content, log in via an institution to check access.

Access this chapter

Log in via an institution

Chapter: USD 29.95; Price excludes VAT (USA)

eBook: USD 49.99; Price excludes VAT (USA)

Softcover Book: USD 64.99; Price excludes VAT (USA)

Tax calculation will be finalised at checkout

Purchases are for personal use only

Institutional subscriptions

Notes

1.
For proof and intuition behind the back-door criteria, along with other sufficient conditions, see [55].

References

New York times article on crime and summer (2009). https://www.nytimes.com/2009/06/19/nyregion/19murder.html?smid=url-share
Alvarez-Melis, D., Jaakkola, T.S.: On the robustness of interpretability methods. arXiv preprint arXiv:1806.08049 (2018)
Apley, D.W., Zhu, J.: Visualizing the effects of predictor variables in black box supervised learning models. arXiv preprint arXiv:1612.08468 (2016)
Avin, C., Shpitser, I., Pearl, J.: Identifiability of path-specific effects (2005)
Google Scholar
Awan, M.U., Morucci, M., Orlandi, V., Roy, S., Rudin, C., Volfovsky, A.: Almost-matching-exactly for treatment effect estimation under network interference. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020, 26–28 August 2020, Online, Palermo, Sicily, Italy. Proceedings of Machine Learning Research, vol. 108, pp. 3252–3262. PMLR (2020)
Google Scholar
Bickel, P.J., Hammel, E.A., O’Connell, J., et al.: Sex bias in graduate admissions: data from Berkeley. Science 187(4175), 398–404 (1975)
Article Google Scholar
Chiappa, S.: Path-specific counterfactual fairness. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 33, pp. 7801–7808 (2019)
Google Scholar
Chouldechova, A.: Fair prediction with disparate impact: a study of bias in recidivism prediction instruments. Big data 5(2), 153–163 (2017)
Article Google Scholar
Corbett-Davies, S., Pierson, E., Feller, A., Goel, S., Huq, A.: Algorithmic decision making and the cost of fairness. In: Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 797–806. ACM (2017)
Google Scholar
Cox, D.R.: The regression analysis of binary sequences (with discussion). J. Roy. Stat. Soc. B 20, 215–242 (1958)
MATH Google Scholar
De Graaf, M.M.A., Malle, B.F.: How people explain action (and autonomous intelligent systems should too). In: 2017 AAAI Fall Symposium Series (2017)
Google Scholar
Dieng, A., Liu, Y., Roy, S., Rudin, C., Volfovsky, A.: Interpretable almost-exact matching for causal inference. In: Chaudhuri, K., Sugiyama, M. (eds.) The 22nd International Conference on Artificial Intelligence and Statistics, AISTATS 2019, 16–18 April 2019, Naha, Okinawa, Japan. Proceedings of Machine Learning Research, vol. 89, pp. 2445–2453. PMLR (2019)
Google Scholar
Dwork, C., Hardt, M., Pitassi, T., Reingold, O., Zemel, R.S.: Fairness through awareness. In: ITCS, pp. 214–226. ACM (2012)
Google Scholar
Fisher, A., Rudin, C., Dominici, F.: Model class reliance: variable importance measures for any machine learning model class, from the “rashomon” perspective. arXiv preprint arXiv:1801.01489, p. 68 (2018)
Ronald Aylmer Fisher: The Design of Experiments. Oliver and Boyd, Oxford (1935)
Google Scholar
Friedman, J.H.: Greedy function approximation: a gradient boosting machine. Ann. Stat. 1189–1232 (2001)
Google Scholar
Frye, C., Feige, I., Rowat, C.: Asymmetric shapley values: incorporating causal knowledge into model-agnostic explainability. arXiv preprint arXiv:1910.06358 (2019)
Funk, M.J., Westreich, D., Wiesen, C., Stürmer, T., Brookhart, M.A., Davidian, M.: Doubly robust estimation of causal effects. Am. J. Epidemiol. 173, 761–767 (2011)
Article Google Scholar
Galhotra, S., Brun, Y., Meliou, A.: Fairness testing: testing software for discrimination. In: Proceedings of the 2017 11th Joint Meeting on Foundations of Software Engineering, pp. 498–510. ACM (2017)
Google Scholar
Galhotra, S., Pradhan, R., Salimi, B.: Explaining black-box algorithms using probabilistic contrastive counterfactuals. In: Proceedings of the International Conference on Management of Data, pp. 577–590 (2021)
Google Scholar
Gerstenberg, T., Goodman, N.D., Lagnado, D.A., Tenenbaum, J.B.: How, whether, why: causal judgments as counterfactual contrasts. In: CogSci (2015)
Google Scholar
Goldstein, A., Kapelner, A., Bleich, J., Pitkin, E.: Peeking inside the black box: visualizing statistical learning with plots of individual conditional expectation. J. Comput. Graph. Stat. 24(1), 44–65 (2015)
Article MathSciNet Google Scholar
Greenwell, B.M., Boehmke, B.C., McCarthy, A.J.: A simple and effective model-based variable importance measure. arXiv preprint arXiv:1805.04755 (2018)
Grynaviski, E.: Contrasts, counterfactuals, and causes. Eur. J. Int. Rel. 19(4), 823–846 (2013)
Article Google Scholar
Guidotti, R., Monreale, A., Ruggieri, S., Turini, F., Giannotti, F., Pedreschi, D.: A survey of methods for explaining black box models. ACM Comput. Surv. (CSUR) 51(5), 1–42 (2018)
Article Google Scholar
Hahn, P.R., Murray, J.S., Carvalho, C.: Bayesian regression tree models for causal inference: regularization, confounding, and heterogeneous effects (2017)
Google Scholar
Hernán, M.A., Robins, J.M.: Causal inference (2010)
Google Scholar
Heskes, T., Sijben, E., Bucur, I.G., Claassen, T.: Causal shapley values: exploiting causal knowledge to explain individual predictions of complex models. arXiv preprint arXiv:2011.01625 (2020)
Holland, P.W.: Statistics and causal inference. J. Am. Stat. Assoc. 81(396), 945–960 (1986)
Article MathSciNet MATH Google Scholar
Hooker, G.: Discovering additive structure in black box functions. In: Proceedings of the Tenth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 575–580 (2004)
Google Scholar
Hooker, G., Mentch, L.: Please stop permuting features: an explanation and alternatives. arXiv preprint arXiv:1905.03151 (2019)
Iacus, S.M., King, G., Porro, G., Katz, J.N.: Causal inference without balance checking: coarsened exact matching. Polit. Anal. 1–24 (2012)
Google Scholar
Imbens, G.W., Rubin, D.B.: Causal Inference in Statistics, Social, and Biomedical Sciences. Cambridge University Press, Cambridge (2015)
Book MATH Google Scholar
Islam, M.T., Fariha, A., Meliou, A., Salimi, B.: Through the data management lens: experimental analysis and evaluation of fair classification. In: Proceedings of the 2022 International Conference on Management of Data, pp. 232–246 (2022)
Google Scholar
Karimi, A.-H., Barthe, G., Belle, B., Valera, I.: Model-agnostic counterfactual explanations for consequential decisions. arXiv preprint arXiv:1905.11190 (2019)
Karimi, A.-H., Barthe, G., Schölkopf, B., Valera, I.: A survey of algorithmic recourse: contrastive explanations and consequential recommendations. ACM Comput. Surv. 55(5), 1–29 (2022)
Article Google Scholar
Karimi, A.-H., von Kügelgen, J., Schölkopf, B., Valera, I.: Algorithmic recourse under imperfect causal knowledge: a probabilistic approach. arXiv preprint arXiv:2006.06831 (2020)
Kilbertus, N., Carulla, M.R., Parascandolo, G., Hardt, M., Janzing, D., Schölkopf, B.: Avoiding discrimination through causal reasoning. In: Advances in Neural Information Processing Systems, pp. 656–666 (2017)
Google Scholar
Kumar, I.E., Venkatasubramanian, S., Scheidegger, C., Friedler, S.: Problems with shapley-value-based explanations as feature importance measures. In: International Conference on Machine Learning, pp. 5491–5500. PMLR (2020)
Google Scholar
Kusner, M.J., Loftus, J., Russell, C., Silva, R.: Counterfactual fairness. In: Advances in Neural Information Processing Systems, pp. 4069–4079 (2017)
Google Scholar
Larson, J., Mattu, S., Kirchner, L., Angwin, J.: How we analyzed the compas recidivism algorithm. ProPublica 9 (2016)
Google Scholar
Laugel, T., Lesot, M.-J., Marsala, C., Renard, X., Detyniecki, M.: Inverse classification for comparison-based interpretability in machine learning. arXiv preprint arXiv:1712.08443 (2017)
Lipton, P.: Contrastive explanation. R. Inst. Philos. Suppl. 27, 247–266 (1990)
Article Google Scholar
Mahajan, D., Tan, C., Sharma, A.: Preserving causal constraints in counterfactual explanations for machine learning classifiers. arXiv preprint arXiv:1912.03277 (2019)
Makhlouf, K., Zhioua, S., Palamidessi, C.: Survey on causal-based machine learning fairness notions. arXiv preprint arXiv:2010.09553 (2020)
Molnar, C.: Interpretable Machine Learning (2020). Lulu.com
Morton, A.: Contrastive knowledge. Contrastivism Philos. 101–115 (2013)
Google Scholar
Morucci, M., Orlandi, V., Roy, S., Rudin, C., Volfovsky, A.: Adaptive hyper-box matching for interpretable individualized treatment effect estimation. In: Adams, R.P., Gogate, V. (eds.) Proceedings of the Thirty-Sixth Conference on Uncertainty in Artificial Intelligence, UAI 2020, Virtual Online, 3–6 August 2020. Proceedings of Machine Learning Research, vol. 124, pp. 1089–1098. AUAI Press (2020)
Google Scholar
Mothilal, R.K., Sharma, A., Tan, C.: Explaining machine learning classifiers through diverse counterfactual explanations. In: Proceedings of the 2020 Conference on Fairness, Accountability, and Transparency, pp. 607–617 (2020)
Google Scholar
Nabi, R., Shpitser, I.: Fair inference on outcomes. In: Proceedings of the AAAI Conference on Artificial Intelligence. AAAI Conference on Artificial Intelligence, vol. 2018, p. 1931. NIH Public Access (2018)
Google Scholar
Neyman, J.: On the application of probability theory to agricultural experiments. Essay on Principles. Section 9. PhD thesis, Roczniki Nauk Rolniczych Tom X [in Polish] (1923). Translated in Statistical Science, vol. 5, pp. 465–480
Google Scholar
Ogburn, E.L., Shpitser, I., Lee, Y.: Causal inference, social networks, and chain graphs (2018)
Google Scholar
Ogburn, E.L., Sofrygin, O., Diaz, I., van der Laan, M.J.: Causal inference for social network data (2017)
Google Scholar
Pearl, J.: Probabilistic Reasoning in Intelligent Systems: Networks of Plausible Inference. Morgan Kaufmann (1988)
Google Scholar
Pearl, J.: Causality: Models, Reasoning, and Inference, 2nd edn. Cambridge University Press, Cambridge (2009)
Book MATH Google Scholar
Pearl, J.: Comment: understanding Simpson’s paradox. In: Probabilistic and Causal Inference: The Works of Judea Pearl, pp. 399–412 (2022)
Google Scholar
Pearl, J., et al.: Causal inference in statistics: an overview. Stat. Surv. 3, 96–146 (2009)
Article MathSciNet MATH Google Scholar
Pearl, J., Glymour, M., Jewell, N.P.: Causal Inference in Statistics: A Primer. Wiley, Hoboken (2016)
MATH Google Scholar
Pfohl, S.R., Duan, T., Ding, D.Y., Shah, N.H.: Counterfactual reasoning for fair clinical risk prediction. In: Machine Learning for Healthcare Conference, pp. 325–358. PMLR (2019)
Google Scholar
Pradhan, R., Zhu, J., Glavic, B., Salimi, B.: Interpretable data-based explanations for fairness debugging. In: SIGMOD (2022)
Google Scholar
Ribeiro, M.T., Singh, S., Guestrin, C.: “Why should I trust you?” Explaining the predictions of any classifier. In: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 1135–1144 (2016)
Google Scholar
Ribeiro, M.T., Singh, S., Guestrin, C.: Anchors: high-precision model-agnostic explanations. In: AAAI, vol. 18, pp. 1527–1535 (2018)
Google Scholar
Rosenbaum, P.R.: Observational Study. Wiley, Hoboken (2005)
Book Google Scholar
Rosenbaum, P.R.: Design of Observational Studies, vol. 10. Springer, Heidelberg (2010)
Book MATH Google Scholar
Rosenbaum, P.R., Rubin, D.B.: The central role of the propensity score in observational studies for causal effects. Biometrika 70(1), 41–55 (1983)
Article MathSciNet MATH Google Scholar
Rosenbaum, P.R., Rubin, D.B.: Reducing bias in observational studies using subclassification on the propensity score. J. Am. Stat. Assoc. 79(387), 516–524 (1984)
Article Google Scholar
Rubin, D.B.: Matching to remove bias in observational studies. Biometrics 159–183 (1973)
Google Scholar
Rubin, D.B.: Estimating causal effects of treatments in randomized and nonrandomized studies. J. Educ. Psychol. 66(5), 688 (1974)
Article Google Scholar
Rubin, D.B.: Causal inference using potential outcomes. J. Am. Stat. Assoc. 100(469), 322–331 (2005)
Article MATH Google Scholar
Russell, C., Kusner, M.J., Loftus, J., Silva, R.: When worlds collide: integrating different counterfactual assumptions in fairness. In: Advances in Neural Information Processing Systems, pp. 6414–6423 (2017)
Google Scholar
Salimi, B., Cole, C., Ports, D.R.K., Suciu, D.: ZaliQL: causal inference from observational data at scale. Proc. VLDB Endow. 10(12), 1957–1960 (2017)
Article Google Scholar
Salimi, B., Howe, B., Suciu, D.: Data management for causal algorithmic fairness. Data Eng. 24 (2019)
Google Scholar
Salimi, B., Parikh, H., Kayali, M., Getoor, L., Roy, S., Suciu, D.: Causal relational learning. In: Proceedings of the 2020 ACM SIGMOD International Conference on Management of Data, pp. 241–256 (2020)
Google Scholar
Salimi, B., Rodriguez, L., Howe, B., Suciu, D.: Interventional fairness: causal database repair for algorithmic fairness. In: Proceedings of the 2019 International Conference on Management of Data, pp. 793–810. ACM (2019)
Google Scholar
Schwab, P., Karlen, W.: CXPlain: causal explanations for model interpretation under uncertainty. arXiv preprint arXiv:1910.12336 (2019)
Stuart, E.A.: Matching methods for causal inference: a review and a look forward. Statistical science: a review. J. Inst. Math. Stat. 1–21 (2010)
Google Scholar
Ustun, B., Spangher, A., Liu, Y.: Actionable recourse in linear classification. In: Proceedings of the Conference on Fairness, Accountability, and Transparency, pp. 10–19 (2019)
Google Scholar
Verma, S., Rubin, J.: Fairness definitions explained. In: 2018 IEEE/ACM International Workshop on Software Fairness (FairWare), pp. 1–7. IEEE (2018)
Google Scholar
Wachter, S., Mittelstadt, B., Russell, C.: Counterfactual explanations without opening the black box: automated decisions and the GDPR. Harv. JL Tech. 31, 841 (2017)
Google Scholar
Wang, J., Wiens, J., Lundberg, S.: Shapley flow: a graph-based approach to interpreting model predictions. In: International Conference on Artificial Intelligence and Statistics, pp. 721–729. PMLR (2021)
Google Scholar
Wang, T., et al.: FLAME: a fast large-scale almost matching exactly approach to causal inference. J. Mach. Learn. Res. 22, 31:1–31:41 (2021)
Google Scholar
Woodward, J.: Making Things Happen: A Theory of Causal Explanation. Oxford University Press, Oxford (2005)
Google Scholar
Zliobaite, I.: A survey on measuring indirect discrimination in machine learning. arXiv preprint arXiv:1511.00148 (2015)

Download references

Author information

Authors and Affiliations

Duke University, Durham, NC, USA
Sudeepa Roy
University of California, San Diego, CA, USA
Babak Salimi

Authors

Sudeepa Roy
View author publications
You can also search for this author in PubMed Google Scholar
Babak Salimi
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Sudeepa Roy .

Editor information

Editors and Affiliations

SKEMA Business School, Montreal, Canada
Leopoldo Bertossi
University of Bergen, Bergen, Norway
Guohui Xiao

Rights and permissions

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Roy, S., Salimi, B. (2023). Causal Inference in Data Analysis with Applications to Fairness and Explanations. In: Bertossi, L., Xiao, G. (eds) Reasoning Web. Causality, Explanations and Declarative Knowledge. Lecture Notes in Computer Science, vol 13759. Springer, Cham. https://doi.org/10.1007/978-3-031-31414-8_3

Download citation

DOI: https://doi.org/10.1007/978-3-031-31414-8_3
Published: 28 April 2023
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-31413-1
Online ISBN: 978-3-031-31414-8
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Causal Inference in Data Analysis with Applications to Fairness and Explanations