Explainable artificial intelligence (XAI) post-hoc explainability methods: risks and limitations in non-discrimination law

Vale, Daniel; El-Sharif, Ali; Ali, Muhammed

doi:10.1007/s43681-022-00142-y

Explainable artificial intelligence (XAI) post-hoc explainability methods: risks and limitations in non-discrimination law

Original Research
Published: 15 March 2022

Volume 2, pages 815–826, (2022)
Cite this article

AI and Ethics Aims and scope Submit manuscript

5768 Accesses
27 Citations
Explore all metrics

Abstract

Organizations are increasingly employing complex black-box machine learning models in high-stakes decision-making. A popular approach to addressing the problem of opacity of black-box machine learning models is the use of post-hoc explainability methods. These methods approximate the logic of underlying machine learning models with the aim of explaining their internal workings, so that human examiners can understand them. In turn, it has been alluded that the insights from post-hoc explainability methods can be used to help regulate black-box machine learning. This article examines the validity of these claims. By examining whether the insights derived from post-hoc explainability methods in post-model deployment can prima facie meet legal definitions in European (read European Union) non-discrimination law, we argue that machine learning post-hoc explanation methods cannot guarantee the insights they generate.

Ultimately, we argue that the use of post-hoc explanatory methods is useful in many cases, but that these methods have limitations that prohibit reliance as the sole mechanism to guarantee fairness of model outcomes in high-stakes decision-making. By way of an ancillary function, the inadequacy of European Non-Discrimination Law for algorithmic decision-making is demonstrated too.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

AI bias: exploring discriminatory algorithmic decision-making models and the application of possible machine-centric solutions adapted from the pharmaceutical industry

Article 10 February 2022

Unboxing the Black Box of Artificial Intelligence: Algorithmic Transparency and/or a Right to Functional Explainability

AI’s fairness problem: understanding wrongful discrimination in the context of automated decision-making

Article Open access 16 November 2022

References

Adadi, A., Berrada, M.: Peeking inside the Black-box: a survey on explainable artificial intelligence (XAI). IEEE Access 6, 52138–52160 (2018). https://doi.org/10.1109/access.2018.2870052
Article Google Scholar
Ahmad, M.A., Eckert, C., Teredesai, A.:. Interpretable machine learning in healthcare. In: Proceedings of the 2018 ACM International Conference on Bioinformatics, Computational Biology, and Health Informatics. https://doi.org/10.1145/3233547.3233667 (2018)
Alom, M., Taha, T., Yakopcic, C., Westberg, S., Sidike, P., Nasrin, M., Asari, V., et al.: The history began from alexnet: a comprehensive survey on deep learning approaches. https://arxiv.org/abs/1803.01164 (2018)
Angwin, J., Larson, J., Mattu, S., & Kirchner, L. (2016, May 23). Machine Bias. Retrieved from ProPublica: https://www.propublica.org/article/machine-bias-risk-assessments-in-criminal-sentencing
Baehrens, D., Schroeter, T., Harmeling, S., Kawanabe, M., Hansen, K., Muller, K.: How to explain individual classification decisions. J Mach Learn Res 1803–1831. https://dl.acm.org/doi/pdf/10.5555/1756006.1859912 (2010)
Barredo Arrieta, A., Díaz-Rodríguez, N., Del Ser, J., Bennetot, A., Tabik, S., Barbado, A., Garcia, S., Gil-Lopez, S., Molina, D., Benjamins, R., Chatila, R., Herrera, F.: Explainable artificial intelligence (XAI): concepts, taxonomies, opportunities and challenges toward responsible AI. Inf Fusion 58, 82–115 (2020). https://doi.org/10.1016/j.inffus.2019.12.012
Article Google Scholar
Bodria, F., Giannotti, F., Guidotti, R., Naretto, F., Pedreschi, D., Rinzivillo, S.:. Benchmarking and survey of explanation methods for black box models. https://arxiv.org/pdf/2102.13076.pdf (2021)
Bratko, I.: Machine learning: between accuracy and interpretability. In: Della Riccia, G., Lenz, H.-J., Kruse, R. (eds.) Learning, Networks and Statistics. ICMS, vol. 382, pp. 163–177. Springer, Vienda (1997)
Chapter MATH Google Scholar
Breiman, L.: Statistical modeling: the two cultures (with comments and a rejoinder by the author). Stat Sci (2001). https://doi.org/10.1214/ss/1009213726
Article MATH Google Scholar
Burkart, N., Huber, M.F.: A survey on the explainability of supervised machine learning. J Artif Intell Res 70, 245–317 (2021). https://doi.org/10.1613/jair.1.12228
Article MathSciNet MATH Google Scholar
Burrell, J.: How the machine ‘thinks’: understanding opacity in machine learning algorithms. Big Data Soc. 3(1), 205395171562251 (2016). https://doi.org/10.1177/2053951715622512
Article Google Scholar
Camburu, O., Giunchiglia, E., Foerster, J., Lukasiewicz, T., Blunsom, P.: Can I trust the explainer? Verifying post-hoc explanatory methods. https://arxiv.org/abs/1910.02065 (2019)
Carvalho, D.V., Pereira, E.M., Cardoso, J.S.: Machine learning Interpretability: a survey on methods and metrics. Electronics 8(8), 832 (2019). https://doi.org/10.3390/electronics8080832
Article Google Scholar
Choi, E., Bahadori, M., Kulas, J., Schuetz, A., Stewart, W., Sun, J.: RETAIN: an interpretable predictive model for healthcare using reverse time attention mechanism. Adv Neural Inf Process Syst 3504–3512 (2016). https://arxiv.org/abs/1608.05745
Council of Europe: European Court of Human Rights: Handbook on European non-discrimination law. Council of Europe: European Court of Human Rights, Strasburg (2018)
Google Scholar
Covert, I., Lundberg, S., Lee, S.: Explaining by removing: a unified framework for model explanation. https://arxiv.org/abs/2011.14878 (2020)
Cranor, L.: A framework for reasoning about the human in the loop. https://www.usenix.org/legacy/event/upsec/tech/full_papers/cranor/cranor.pdf (2008)
Deng, J., Dong, W., Socher, R., Li, L., Kai, L., Li, F.-F.: ImageNet: a large-scale hierarchical image database. In: 2009 IEEE Conference on Computer Vision and Pattern Recognition. https://doi.org/10.1109/cvpr.2009.5206848 (2009)
Douglas-Scott, S.: The European Union and human rights after the treaty of Lisbon. Hum. Rights Law Rev. 11(4), 645–682 (2011). https://doi.org/10.1093/hrlr/ngr038
Article Google Scholar
Doyle, O.: Direct discrimination, indirect discrimination and autonomy. Oxf. J. Leg. Stud. 27(3), 537–553 (2007). https://doi.org/10.1093/ojls/gqm008
Article Google Scholar
Du, M., Liu, N., Hu, X.: Techniques for interpretable machine learning. Commun. ACM 63(1), 68–77 (2019). https://doi.org/10.1145/3359786
Article Google Scholar
Dwork, C., Hardt, M., Pitassi, T., Reingold, O., Zemel, R.: Fairness through awareness. In: Proceedings of the 3rd Innovations in Theoretical Computer Science Conference on-ITCS '12. https://doi.org/10.1145/2090236.2090255 (2012)
Dwork, C., Immorlica, N., Kalai, A.T., Leiserson, M.: Decoupled classifiers for fair and efficient machine learning. https://arxiv.org/abs/1707.06613 (2017)
Ellis, E., Watson, P.: Key concepts in EU anti-discrimination law. EU Anti-Discrimination Law (2012). https://doi.org/10.1093/acprof:oso/9780199698462.003.0004
Article Google Scholar
Ernst, C.: Artificial intelligence and autonomy: self-determination in the age of automated systems. Regulat. Artif. Intell. (2019). https://doi.org/10.1007/978-3-030-32361-5_3
Article Google Scholar
Feldman, M., Friedler, S.A., Moeller, J., Scheidegger, C., Venkatasubramanian, S.: Certifying and removing disparate impact. In: Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. https://doi.org/10.1145/2783258.2783311 (2015)
Floridi, L., Chiriatti, M.: GPT-3: its nature, scope, limits, and consequences. Mind. Mach. 30(4), 681–694 (2020). https://doi.org/10.1007/s11023-020-09548-1
Article Google Scholar
Foster, K.R., Koprowski, R., Skufca, J.D.: Machine learning, medical diagnosis, and biomedical engineering research - commentary. Biomed. Eng. Online 13(1), 94 (2014). https://doi.org/10.1186/1475-925x-13-94
Article Google Scholar
Gerards, J., Xenidis, R.: Algorithmic discrimination in Europe: Challenges and opportunities for gender equality and non-discrimination law. Publications Office of the European Union (2021)
Gilpin, L.H., Bau, D., Yuan, B.Z., Bajwa, A., Specter, M., Kagal, L.: Explaining explanations: an overview of interpretability of machine learning. In: 2018 IEEE 5th International Conference on Data Science and Advanced Analytics (DSAA). https://doi.org/10.1109/dsaa.2018.00018 (2018)
Girasa, R.: AI US policies and regulations. Intell. Disrupt. Technol Artif (2020). https://doi.org/10.1007/978-3-030-35975-1_3
Article Google Scholar
Guidotti, R., Monreale, A., Ruggieri, S., Turini, F., Giannotti, F., Pedreschi, D.: A survey of methods for explaining black box models. ACM Comput. Surv. 51(5), 1–42 (2019). https://doi.org/10.1145/3236009
Article Google Scholar
Guiraudon, V.: Equality in the making: implementing European non-discrimination law. Citizsh. Stud. 13(5), 527–549 (2009). https://doi.org/10.1080/13621020903174696
Article Google Scholar
Hall, P., Gill, N., Schmidt, P.: Proposed guidelines for the responsible use of explainable machine learning. https://arxiv.org/abs/1906.03533 (2019)
Hall, P., Gill, N., Kurka, M., Phan, W.: Machine learning interpretability with H20 driverless AI. Mountain View: H20. https://www.h2o.ai/wp-content/uploads/2017/09/MLI.pdf (2017)
Hand, D.J.: Classifier technology and the illusion of progress. Stat. Sci. (2006). https://doi.org/10.1214/088342306000000060
Article MathSciNet MATH Google Scholar
Hardt, M., Price, E., Srebro, N.: Equality of opportunity in supervised learning. In: Advances in Neural Information Processing Systems. https://arxiv.org/abs/1610.02413 (2016)
Kantola, J., Nousiainen, K.: The European Union: Initiator of a New European Anti-Discrimination Regime? In: Krizsan A, Skjeie H, Squires J (eds) Institutionalizing Intersectionality: The Changing Nature of European Equality Regimes. Palgrave Macmillan (2012)
Larson, J., Mattu, S., Kirchner, L., Angwin, J.: How we analyzed the COMPAS recidivism algorithm. ProPublica 1–16 (2016). https://www.propublica.org/article/how-we-analyzed-the-compas-recidivism-algorithm
Laugel, T., Lesot, M., Marsala, C., Renard, X., Detyniecki, M.: The dangers of post-hoc interpretability: unjustified counterfactual explanations. In: Proceedings of the Twenty-Eighth International Joint Conference on Artificial Intelligence. https://doi.org/10.24963/ijcai.2019/388 (2019)
Lundberg, S.M., Erion, G., Chen, H., DeGrave, A., Prutkin, J.M., Nair, B., Katz, R., Himmelfarb, J., Bansal, N., Lee, S.: From local explanations to global understanding with explainable AI for trees. Nat. Mach. Intell. 2(1), 56–67 (2020). https://doi.org/10.1038/s42256-019-0138-9
Article Google Scholar
Mair, J.: Direct discrimination: limited by definition? Int. J. Discrim. Law 10(1), 3–17 (2009). https://doi.org/10.1177/135822910901000102
Article Google Scholar
Maliszewska-Nienartowicz, J.: Direct and indirect discrimination in European Union Law—how to draw a dividing line? Int. J. Soc. Sci. 41–55 (2014). https://www.iises.net/download/Soubory/soubory-puvodni/pp041-055_ijoss_2014v3n1.pdf
Molnar, C., Konig, G., Herbinger, J., Freiesleben, T., Dandl, S., Scholbeck, C. A., Casalicchio G., Grosse-Wentrup M., Bischl, B.: Pitfalls to avoid when interpreting machine learning models. https://arxiv.org/abs/2007.04131 (2020)
Meske, C., Bunde, E.: Transparency and trust in human–AI-interaction: the role of model-agnostic explanations in computer vision-based decision support. Artif. Intell. HCI (2020). https://doi.org/10.1007/978-3-030-50334-5_4
Article Google Scholar
Montavon, G., Lapuschkin, S., Binder, A., Samek, W., Müller, K.: Explaining nonlinear classification decisions with deep Taylor decomposition. Pattern Recogn. 65, 211–222 (2017). https://doi.org/10.1016/j.patcog.2016.11.008
Article Google Scholar
Murdoch, W.J., Singh, C., Kumbier, K., Abbasi-Asl, R., Yu, B.: Definitions, methods, and applications in interpretable machine learning. Proc. Natl. Acad. Sci. 116(44), 22071–22080 (2019). https://doi.org/10.1073/pnas.1900654116
Article MathSciNet MATH Google Scholar
Narayanan, A.: Translation tutorial: 21 fairness definitions and their politics. In: Proceedings of the Conference on. Fairness Accountability Transparency. https://fairmlbook.org/tutorial2.html (2018)
Nie, L., Wang, M., Zhang, L., Yan, S., Zhang, B., Chua, T.: Disease inference from health-related questions via sparse deep learning. IEEE Trans. Knowl. Data Eng. 27(8), 2107–2119 (2015). https://doi.org/10.1109/tkde.2015.2399298
Article Google Scholar
Onishi, T., Saha, S.K., Delgado-Montero, A., Ludwig, D.R., Onishi, T., Schelbert, E.B., Schwartzman, D., Gorcsan, J.: Global longitudinal strain and global circumferential strain by speckle-tracking echocardiography and feature-tracking cardiac magnetic resonance imaging: comparison with left ventricular ejection fraction. J. Am. Soc. Echocardiogr. 28(5), 587–596 (2015). https://doi.org/10.1016/j.echo.2014.11.018
Article Google Scholar
O’Sullivan, S., Nevejans, N., Allen, C., Blyth, A., Leonard, S., Pagallo, U., Holzinger, K., Holzinger, A., Sajid, M.I., Ashrafian, H.: Legal, regulatory, and ethical frameworks for development of standards in artificial intelligence (AI) and autonomous robotic surgery. Int. J. Med. Robot. Comput. Assist. Surg. 15(1), e1968 (2019). https://doi.org/10.1002/rcs.1968
Article Google Scholar
Qian, K., Danilevsky, M., Katsis, Y., Kawas, B., Oduor, E., Popa, L., Li, Y.: XNLP: A living survey for XAI research in natural language processing. In: 26th International Conference on Intelligent User Interfaces. https://doi.org/10.1145/3397482.3450728 (2021)
Pasquale, F.: The black box society, the secret algorithms that control money and information. Cambridge, MA: Harvard University Press. https://doi.org/10.4159/harvard.9780674736061 (2015)
Pasquale, F.: Toward a fourth law of robotics: preserving attribution, responsibility, and explainability in an algorithmic society. Ohio State Law J. https://ssrn.com/abstract=3002546 (2017)
Pedreschi, D., Giannotti, F., Guidotti, R., Monreale, A., Ruggieri, S., Turini, F.: Meaningful explanations of black box AI decision systems. Proc. AAAI Conf. Artif. Intell. 33, 9780–9784 (2019). https://doi.org/10.1609/aaai.v33i01.33019780
Article Google Scholar
Ribeiro, M., Singh, S., Guestrin, C.: Model-agnostic interpretability of machine learning. https://arxiv.org/abs/1606.05386 (2016)
Ribeiro, M., Singh, S., Guestrin, C.: Why should i trust you?: Explaining the predictions of any classifier. In: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. https://doi.org/10.1145/2939672.2939778 (2016)
Ringelheim, J.: The burden of proof in antidiscrimination proceedings. A focus on Belgium, France and Ireland. Eur. Equal. Law Rev. (2019). https://ssrn.com/abstract=3498346
Rissland, E.: AI and legal reasoning. In: Proceedings of the 9th International Joint Conference on Artificial Intelligence. https://dl.acm.org/doi/abs/10.5555/1623611.1623724 (1985)
Rissland, E.L., Ashley, K.D., Loui, R.: AI and law: a fruitful synergy. Artif. Intell. 150(1–2), 1–15 (2003). https://doi.org/10.1016/s0004-3702(03)00122-x
Article Google Scholar
Robnik-Šikonja, M., Bohanec, M.: Perturbation-based explanations of prediction models. Hum. Mach. Learn. (2018). https://doi.org/10.1007/978-3-319-90403-0_9
Article Google Scholar
Rudin, C.: Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead. Nature Machine Intelligence 1(5), 206–215 (2019). https://doi.org/10.1038/s42256-019-0048-x
Article Google Scholar
Samek, W., Montavon, G., Lapuschkin, S., Anders, C.J., Müller, K.R.: Explaining deep neural networks and beyond: a review of methods and applications. Proc. IEEE 109(3), 247–278 (2021). https://doi.org/10.1109/JPROC.2021.3060483
Article Google Scholar
Schwab, P., Karlen, W.: CXPlain: causal explanations for model interpretation under uncertainty. In: Advances in Neural Information Processing Systems. https://arxiv.org/abs/1910.12336 (2019)
Selbst, A.D., Barocas, S.: The intuitive appeal of explainable machines. SSRN Electron. J. (2018). https://doi.org/10.2139/ssrn.3126971
Article Google Scholar
Suresh, H., Gong, J.J., Guttag, J.V.: Learning tasks for multitask learning. In: Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. https://doi.org/10.1145/3219819.3219930 (2018)
Suresh, H., Guttag, J.: A framework for understanding unintended consequences of machine learning. https://arxiv.org/abs/1901.10002 (2019)
Tan, S., Caruana, R., Hooker, G., Lou, Y.: Distill-and-compare. In: Proceedings of the 2018 AAAI/ACM Conference on AI, Ethics, and Society. https://doi.org/10.1145/3278721.3278725 (2018)
Tischbirek, A.: Artificial intelligence and discrimination: discriminating against discriminatory systems. Regul. Artif. Intell. (2019). https://doi.org/10.1007/978-3-030-32361-5_5
Article Google Scholar
VanderWeele, T.J., Hernan, M.A.: Results on differential and dependent measurement error of the exposure and the outcome using signed directed acyclic graphs. Am. J. Epidemiol. 175(12), 1303–1310 (2012). https://doi.org/10.1093/aje/kwr458
Article Google Scholar
Verma, S., Rubin, J.: Fairness definitions explained. Proc. Int. Workshop Softw. Fairness (2018). https://doi.org/10.1145/3194770.3194776
Article Google Scholar
Viljoen, S.: Democratic data: a relational theory for data governance. SSRN Electron. J. (2020). https://doi.org/10.2139/ssrn.3727562
Article Google Scholar
Visani, G., Bagli, E., Chesani, F., Poluzzi, A., Capuzzo, D.: Statistical stability indices for LIME: obtaining reliable explanations for machine learning models. J. Oper. Res. Soc. (2021). https://doi.org/10.1080/01605682.2020.1865846
Article Google Scholar
Wachter, S., Mittelstadt, B., Russell, C.: Why fairness cannot be automated: bridging the gap between EU non-discrimination law and AI. SSRN Electron. J. (2020). https://doi.org/10.2139/ssrn.3547922
Article Google Scholar
Wachter, S., Mittelstadt, B., Russell, C.: Bias preservation in machine learning: the legality of fairness metrics under EU non-discrimination law. SSRN Electron. J. (2021). https://doi.org/10.2139/ssrn.3792772
Article Google Scholar
Wang, W., Siau, K., Keng, S.: Artificial intelligence: a study on governance, policies, and regulations. Association for Information Systems AIS Electronic Library. http://aisel.aisnet.org/mwais2018/40 (2018)
Wischmeyer, T.: Artificial intelligence and transparency: opening the black box. Regul. Artif. Intell. (2019). https://doi.org/10.1007/978-3-030-32361-5_4
Article Google Scholar
Wischmeyer, T., Rademacher, T.: Regulating Artificial Intelligence. International Springer Publications, New York City (2020). https://doi.org/10.1007/978-3-030-32361-5
Book Google Scholar
Zafar, M., Khan, N.: DLIME: a deterministic local interpretable model-agnostic explaination approach for computer-aided diagnosis systems. https://arxiv.org/abs/1906.10263 (2019)
Zemel, R., Wu, Y., Swersky, K., Pitassi, T., Dwork, C.: Learning fair representations. In: International Conference on Machine Learning, pp. 325–333. PMLR. https://proceedings.mlr.press/v28/zemel13.html (2013)
Zhang, Y., Song, S., Sun, Y., Tan, S., Udell, M.: "Why Should You Trust My Explaination?" Understanding uncertainty in LIME explanations. https://arxiv.org/abs/1904.12991 (2019)
Zuiderveen Borgesius, F.J.: Strengthening legal protection against discrimination by algorithms and artificial intelligence. Int. J. Hum. Rights 24(10), 1572–1593 (2020). https://doi.org/10.1080/13642987.2020.1743976
Article Google Scholar
Zuiderveen Borgesius, F.J.: Discrimination, artificial intelligence, and algorithmic decision-making. Council of Europe, Directorate General of Democracy. https://rm.coe.int/discrimination-artificial-intelligence-and-algorithmic-decision-making/1680925d73 (2018)

Download references

Author information

Authors and Affiliations

eLaw Centre, Leiden School of Law, Leiden University, Leiden, The Netherlands
Daniel Vale
College of Computing and Engineering, Nova Southeastern University, Fort Lauderdale, USA
Ali El-Sharif
UCL Knowledge Lab, University College London, London, UK
Muhammed Ali

Authors

Daniel Vale
View author publications
You can also search for this author in PubMed Google Scholar
Ali El-Sharif
View author publications
You can also search for this author in PubMed Google Scholar
Muhammed Ali
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Daniel Vale.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Vale, D., El-Sharif, A. & Ali, M. Explainable artificial intelligence (XAI) post-hoc explainability methods: risks and limitations in non-discrimination law. AI Ethics 2, 815–826 (2022). https://doi.org/10.1007/s43681-022-00142-y

Download citation

Received: 22 September 2021
Accepted: 30 January 2022
Published: 15 March 2022
Issue Date: November 2022
DOI: https://doi.org/10.1007/s43681-022-00142-y

Keywords

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Explainable artificial intelligence (XAI) post-hoc explainability methods: risks and limitations in non-discrimination law

Abstract

Access this article

Similar content being viewed by others

AI bias: exploring discriminatory algorithmic decision-making models and the application of possible machine-centric solutions adapted from the pharmaceutical industry

Unboxing the Black Box of Artificial Intelligence: Algorithmic Transparency and/or a Right to Functional Explainability

AI’s fairness problem: understanding wrongful discrimination in the context of automated decision-making

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Keywords

Navigation

Explainable artificial intelligence (XAI) post-hoc explainability methods: risks and limitations in non-discrimination law

Abstract

Access this article

Similar content being viewed by others

AI bias: exploring discriminatory algorithmic decision-making models and the application of possible machine-centric solutions adapted from the pharmaceutical industry

Unboxing the Black Box of Artificial Intelligence: Algorithmic Transparency and/or a Right to Functional Explainability

AI’s fairness problem: understanding wrongful discrimination in the context of automated decision-making

References

Author information

Authors and Affiliations

Corresponding author

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation