Abstract
Causal inference is a fundamental concept that goes beyond simple correlation and model-based prediction analysis, and is highly relevant in domains such as health, medicine, and the social sciences. Causal inference enables the estimation of the impact of an intervention or treatment on the world, making it critical for sound and robust policy making. However, randomized controlled experiments, which are typically considered as the gold standard for inferring causal conclusions, are often not feasible due to ethical, cost, or other constraints. Fortunately, there is a rich literature in Artificial Intelligence (AI), Machine Learning (ML), and Statistics on observational studies, which are methods for causal inference on observed or collected data under certain assumptions. In this paper, we provide an overview of popular formal and rigorous techniques for causal inference on observed data from the AI and Statistics literature. Furthermore, we discuss how concepts from causal inference can be used to infer fairness and enable explainability in machine learning models, which are critical in responsible data science when ML is used in making high-stake decisions in various contexts. Our discussion highlights the importance of using causal inference in ML models and provides insights on how to develop more transparent and responsible AI systems.
Access this chapter
Tax calculation will be finalised at checkout
Purchases are for personal use only
Notes
- 1.
For proof and intuition behind the back-door criteria, along with other sufficient conditions, see [55].
References
New York times article on crime and summer (2009). https://www.nytimes.com/2009/06/19/nyregion/19murder.html?smid=url-share
Alvarez-Melis, D., Jaakkola, T.S.: On the robustness of interpretability methods. arXiv preprint arXiv:1806.08049 (2018)
Apley, D.W., Zhu, J.: Visualizing the effects of predictor variables in black box supervised learning models. arXiv preprint arXiv:1612.08468 (2016)
Avin, C., Shpitser, I., Pearl, J.: Identifiability of path-specific effects (2005)
Awan, M.U., Morucci, M., Orlandi, V., Roy, S., Rudin, C., Volfovsky, A.: Almost-matching-exactly for treatment effect estimation under network interference. In: Chiappa, S., Calandra, R. (eds.) The 23rd International Conference on Artificial Intelligence and Statistics, AISTATS 2020, 26–28 August 2020, Online, Palermo, Sicily, Italy. Proceedings of Machine Learning Research, vol. 108, pp. 3252–3262. PMLR (2020)
Bickel, P.J., Hammel, E.A., O’Connell, J., et al.: Sex bias in graduate admissions: data from Berkeley. Science 187(4175), 398–404 (1975)
Chiappa, S.: Path-specific counterfactual fairness. In: Proceedings of the AAAI Conference on Artificial Intelligence, vol. 33, pp. 7801–7808 (2019)
Chouldechova, A.: Fair prediction with disparate impact: a study of bias in recidivism prediction instruments. Big data 5(2), 153–163 (2017)
Corbett-Davies, S., Pierson, E., Feller, A., Goel, S., Huq, A.: Algorithmic decision making and the cost of fairness. In: Proceedings of the 23rd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 797–806. ACM (2017)
Cox, D.R.: The regression analysis of binary sequences (with discussion). J. Roy. Stat. Soc. B 20, 215–242 (1958)
De Graaf, M.M.A., Malle, B.F.: How people explain action (and autonomous intelligent systems should too). In: 2017 AAAI Fall Symposium Series (2017)
Dieng, A., Liu, Y., Roy, S., Rudin, C., Volfovsky, A.: Interpretable almost-exact matching for causal inference. In: Chaudhuri, K., Sugiyama, M. (eds.) The 22nd International Conference on Artificial Intelligence and Statistics, AISTATS 2019, 16–18 April 2019, Naha, Okinawa, Japan. Proceedings of Machine Learning Research, vol. 89, pp. 2445–2453. PMLR (2019)
Dwork, C., Hardt, M., Pitassi, T., Reingold, O., Zemel, R.S.: Fairness through awareness. In: ITCS, pp. 214–226. ACM (2012)
Fisher, A., Rudin, C., Dominici, F.: Model class reliance: variable importance measures for any machine learning model class, from the “rashomon” perspective. arXiv preprint arXiv:1801.01489, p. 68 (2018)
Ronald Aylmer Fisher: The Design of Experiments. Oliver and Boyd, Oxford (1935)
Friedman, J.H.: Greedy function approximation: a gradient boosting machine. Ann. Stat. 1189–1232 (2001)
Frye, C., Feige, I., Rowat, C.: Asymmetric shapley values: incorporating causal knowledge into model-agnostic explainability. arXiv preprint arXiv:1910.06358 (2019)
Funk, M.J., Westreich, D., Wiesen, C., Stürmer, T., Brookhart, M.A., Davidian, M.: Doubly robust estimation of causal effects. Am. J. Epidemiol. 173, 761–767 (2011)
Galhotra, S., Brun, Y., Meliou, A.: Fairness testing: testing software for discrimination. In: Proceedings of the 2017 11th Joint Meeting on Foundations of Software Engineering, pp. 498–510. ACM (2017)
Galhotra, S., Pradhan, R., Salimi, B.: Explaining black-box algorithms using probabilistic contrastive counterfactuals. In: Proceedings of the International Conference on Management of Data, pp. 577–590 (2021)
Gerstenberg, T., Goodman, N.D., Lagnado, D.A., Tenenbaum, J.B.: How, whether, why: causal judgments as counterfactual contrasts. In: CogSci (2015)
Goldstein, A., Kapelner, A., Bleich, J., Pitkin, E.: Peeking inside the black box: visualizing statistical learning with plots of individual conditional expectation. J. Comput. Graph. Stat. 24(1), 44–65 (2015)
Greenwell, B.M., Boehmke, B.C., McCarthy, A.J.: A simple and effective model-based variable importance measure. arXiv preprint arXiv:1805.04755 (2018)
Grynaviski, E.: Contrasts, counterfactuals, and causes. Eur. J. Int. Rel. 19(4), 823–846 (2013)
Guidotti, R., Monreale, A., Ruggieri, S., Turini, F., Giannotti, F., Pedreschi, D.: A survey of methods for explaining black box models. ACM Comput. Surv. (CSUR) 51(5), 1–42 (2018)
Hahn, P.R., Murray, J.S., Carvalho, C.: Bayesian regression tree models for causal inference: regularization, confounding, and heterogeneous effects (2017)
Hernán, M.A., Robins, J.M.: Causal inference (2010)
Heskes, T., Sijben, E., Bucur, I.G., Claassen, T.: Causal shapley values: exploiting causal knowledge to explain individual predictions of complex models. arXiv preprint arXiv:2011.01625 (2020)
Holland, P.W.: Statistics and causal inference. J. Am. Stat. Assoc. 81(396), 945–960 (1986)
Hooker, G.: Discovering additive structure in black box functions. In: Proceedings of the Tenth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 575–580 (2004)
Hooker, G., Mentch, L.: Please stop permuting features: an explanation and alternatives. arXiv preprint arXiv:1905.03151 (2019)
Iacus, S.M., King, G., Porro, G., Katz, J.N.: Causal inference without balance checking: coarsened exact matching. Polit. Anal. 1–24 (2012)
Imbens, G.W., Rubin, D.B.: Causal Inference in Statistics, Social, and Biomedical Sciences. Cambridge University Press, Cambridge (2015)
Islam, M.T., Fariha, A., Meliou, A., Salimi, B.: Through the data management lens: experimental analysis and evaluation of fair classification. In: Proceedings of the 2022 International Conference on Management of Data, pp. 232–246 (2022)
Karimi, A.-H., Barthe, G., Belle, B., Valera, I.: Model-agnostic counterfactual explanations for consequential decisions. arXiv preprint arXiv:1905.11190 (2019)
Karimi, A.-H., Barthe, G., Schölkopf, B., Valera, I.: A survey of algorithmic recourse: contrastive explanations and consequential recommendations. ACM Comput. Surv. 55(5), 1–29 (2022)
Karimi, A.-H., von Kügelgen, J., Schölkopf, B., Valera, I.: Algorithmic recourse under imperfect causal knowledge: a probabilistic approach. arXiv preprint arXiv:2006.06831 (2020)
Kilbertus, N., Carulla, M.R., Parascandolo, G., Hardt, M., Janzing, D., Schölkopf, B.: Avoiding discrimination through causal reasoning. In: Advances in Neural Information Processing Systems, pp. 656–666 (2017)
Kumar, I.E., Venkatasubramanian, S., Scheidegger, C., Friedler, S.: Problems with shapley-value-based explanations as feature importance measures. In: International Conference on Machine Learning, pp. 5491–5500. PMLR (2020)
Kusner, M.J., Loftus, J., Russell, C., Silva, R.: Counterfactual fairness. In: Advances in Neural Information Processing Systems, pp. 4069–4079 (2017)
Larson, J., Mattu, S., Kirchner, L., Angwin, J.: How we analyzed the compas recidivism algorithm. ProPublica 9 (2016)
Laugel, T., Lesot, M.-J., Marsala, C., Renard, X., Detyniecki, M.: Inverse classification for comparison-based interpretability in machine learning. arXiv preprint arXiv:1712.08443 (2017)
Lipton, P.: Contrastive explanation. R. Inst. Philos. Suppl. 27, 247–266 (1990)
Mahajan, D., Tan, C., Sharma, A.: Preserving causal constraints in counterfactual explanations for machine learning classifiers. arXiv preprint arXiv:1912.03277 (2019)
Makhlouf, K., Zhioua, S., Palamidessi, C.: Survey on causal-based machine learning fairness notions. arXiv preprint arXiv:2010.09553 (2020)
Molnar, C.: Interpretable Machine Learning (2020). Lulu.com
Morton, A.: Contrastive knowledge. Contrastivism Philos. 101–115 (2013)
Morucci, M., Orlandi, V., Roy, S., Rudin, C., Volfovsky, A.: Adaptive hyper-box matching for interpretable individualized treatment effect estimation. In: Adams, R.P., Gogate, V. (eds.) Proceedings of the Thirty-Sixth Conference on Uncertainty in Artificial Intelligence, UAI 2020, Virtual Online, 3–6 August 2020. Proceedings of Machine Learning Research, vol. 124, pp. 1089–1098. AUAI Press (2020)
Mothilal, R.K., Sharma, A., Tan, C.: Explaining machine learning classifiers through diverse counterfactual explanations. In: Proceedings of the 2020 Conference on Fairness, Accountability, and Transparency, pp. 607–617 (2020)
Nabi, R., Shpitser, I.: Fair inference on outcomes. In: Proceedings of the AAAI Conference on Artificial Intelligence. AAAI Conference on Artificial Intelligence, vol. 2018, p. 1931. NIH Public Access (2018)
Neyman, J.: On the application of probability theory to agricultural experiments. Essay on Principles. Section 9. PhD thesis, Roczniki Nauk Rolniczych Tom X [in Polish] (1923). Translated in Statistical Science, vol. 5, pp. 465–480
Ogburn, E.L., Shpitser, I., Lee, Y.: Causal inference, social networks, and chain graphs (2018)
Ogburn, E.L., Sofrygin, O., Diaz, I., van der Laan, M.J.: Causal inference for social network data (2017)
Pearl, J.: Probabilistic Reasoning in Intelligent Systems: Networks of Plausible Inference. Morgan Kaufmann (1988)
Pearl, J.: Causality: Models, Reasoning, and Inference, 2nd edn. Cambridge University Press, Cambridge (2009)
Pearl, J.: Comment: understanding Simpson’s paradox. In: Probabilistic and Causal Inference: The Works of Judea Pearl, pp. 399–412 (2022)
Pearl, J., et al.: Causal inference in statistics: an overview. Stat. Surv. 3, 96–146 (2009)
Pearl, J., Glymour, M., Jewell, N.P.: Causal Inference in Statistics: A Primer. Wiley, Hoboken (2016)
Pfohl, S.R., Duan, T., Ding, D.Y., Shah, N.H.: Counterfactual reasoning for fair clinical risk prediction. In: Machine Learning for Healthcare Conference, pp. 325–358. PMLR (2019)
Pradhan, R., Zhu, J., Glavic, B., Salimi, B.: Interpretable data-based explanations for fairness debugging. In: SIGMOD (2022)
Ribeiro, M.T., Singh, S., Guestrin, C.: “Why should I trust you?” Explaining the predictions of any classifier. In: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 1135–1144 (2016)
Ribeiro, M.T., Singh, S., Guestrin, C.: Anchors: high-precision model-agnostic explanations. In: AAAI, vol. 18, pp. 1527–1535 (2018)
Rosenbaum, P.R.: Observational Study. Wiley, Hoboken (2005)
Rosenbaum, P.R.: Design of Observational Studies, vol. 10. Springer, Heidelberg (2010)
Rosenbaum, P.R., Rubin, D.B.: The central role of the propensity score in observational studies for causal effects. Biometrika 70(1), 41–55 (1983)
Rosenbaum, P.R., Rubin, D.B.: Reducing bias in observational studies using subclassification on the propensity score. J. Am. Stat. Assoc. 79(387), 516–524 (1984)
Rubin, D.B.: Matching to remove bias in observational studies. Biometrics 159–183 (1973)
Rubin, D.B.: Estimating causal effects of treatments in randomized and nonrandomized studies. J. Educ. Psychol. 66(5), 688 (1974)
Rubin, D.B.: Causal inference using potential outcomes. J. Am. Stat. Assoc. 100(469), 322–331 (2005)
Russell, C., Kusner, M.J., Loftus, J., Silva, R.: When worlds collide: integrating different counterfactual assumptions in fairness. In: Advances in Neural Information Processing Systems, pp. 6414–6423 (2017)
Salimi, B., Cole, C., Ports, D.R.K., Suciu, D.: ZaliQL: causal inference from observational data at scale. Proc. VLDB Endow. 10(12), 1957–1960 (2017)
Salimi, B., Howe, B., Suciu, D.: Data management for causal algorithmic fairness. Data Eng. 24 (2019)
Salimi, B., Parikh, H., Kayali, M., Getoor, L., Roy, S., Suciu, D.: Causal relational learning. In: Proceedings of the 2020 ACM SIGMOD International Conference on Management of Data, pp. 241–256 (2020)
Salimi, B., Rodriguez, L., Howe, B., Suciu, D.: Interventional fairness: causal database repair for algorithmic fairness. In: Proceedings of the 2019 International Conference on Management of Data, pp. 793–810. ACM (2019)
Schwab, P., Karlen, W.: CXPlain: causal explanations for model interpretation under uncertainty. arXiv preprint arXiv:1910.12336 (2019)
Stuart, E.A.: Matching methods for causal inference: a review and a look forward. Statistical science: a review. J. Inst. Math. Stat. 1–21 (2010)
Ustun, B., Spangher, A., Liu, Y.: Actionable recourse in linear classification. In: Proceedings of the Conference on Fairness, Accountability, and Transparency, pp. 10–19 (2019)
Verma, S., Rubin, J.: Fairness definitions explained. In: 2018 IEEE/ACM International Workshop on Software Fairness (FairWare), pp. 1–7. IEEE (2018)
Wachter, S., Mittelstadt, B., Russell, C.: Counterfactual explanations without opening the black box: automated decisions and the GDPR. Harv. JL Tech. 31, 841 (2017)
Wang, J., Wiens, J., Lundberg, S.: Shapley flow: a graph-based approach to interpreting model predictions. In: International Conference on Artificial Intelligence and Statistics, pp. 721–729. PMLR (2021)
Wang, T., et al.: FLAME: a fast large-scale almost matching exactly approach to causal inference. J. Mach. Learn. Res. 22, 31:1–31:41 (2021)
Woodward, J.: Making Things Happen: A Theory of Causal Explanation. Oxford University Press, Oxford (2005)
Zliobaite, I.: A survey on measuring indirect discrimination in machine learning. arXiv preprint arXiv:1511.00148 (2015)
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
Rights and permissions
Copyright information
© 2023 The Author(s), under exclusive license to Springer Nature Switzerland AG
About this chapter
Cite this chapter
Roy, S., Salimi, B. (2023). Causal Inference in Data Analysis with Applications to Fairness and Explanations. In: Bertossi, L., Xiao, G. (eds) Reasoning Web. Causality, Explanations and Declarative Knowledge. Lecture Notes in Computer Science, vol 13759. Springer, Cham. https://doi.org/10.1007/978-3-031-31414-8_3
Download citation
DOI: https://doi.org/10.1007/978-3-031-31414-8_3
Published:
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-31413-1
Online ISBN: 978-3-031-31414-8
eBook Packages: Computer ScienceComputer Science (R0)