The field of Artificial Intelligence has seen dramatic progress over the last 15 years. Using machine learning methods, software systems that automatically learn and improve relationships using digitized experience, researchers and practitioners alike have developed practical applications that are indispensable and strongly facilitate people's everyday life (Jordan and Mitchell 2015). Pervasive examples include object recognition (e.g., Facebook's Moments and Intel Security's True Key), natural language processing (e.g., DeepL and Google Translate), recommender systems (e.g., recommendations by Netflix or iTunes), and digital assistants (e.g., Alexa and Siri).
At their core, these applications have in common that highly complex and increasingly opaque networks of mathematical constructs are trained using historical data to make predictions about an uncertain state of the world. Based on large sets of labeled images, Deep Convolutional Neural Networks, for instance, can learn to make highly accurate individual-level predictions about the presence of diseases. This includes predicting positive COVID-19 patients (Shi et al. 2020). While highly accurate predictions in and of themselves are vital to informing fact-based decision-making (regarding disease detection even in a literal sense), the high predictive performance of state-of-the-art machine learning models generally comes at the expense of transparency and interpretability of their outputs (Voosen 2017; Du et al. 2019). Put differently: the majority of high-performance machine learning models are characterized by an incapability to convey human-interpretable information about how and why they produce specific predictions. Hence, such machine learning applications are often complete black boxes to their human users and even expert designers, who frequently lack an understanding of the reason behind decision-critical outputs.
From a methodological point of view, the inability to provide an explanation that accompanies specific predictions creates three types of high-level problems.
First, neglected opacity creates an immediate lack of accountability as it impedes the auditing of such systems' predictions. This shortcoming has sparked concerns about the rise of a black box society where opaque algorithmic decision-making processes in organizations and institutions entail unintended and unanticipated downstream ramifications, which change things for the worse (Pasquale 2015; Angwin et al. 2016; Obermeyer et al. 2019).
Second, the potential to enhance economic efficiency and human welfare using AI is not limited to informing specific decisions through predictions. Revealing new domain knowledge hidden in complex Big Data structures appears to be another extremely promising avenue (Teso and Hinz 2020). Hence, organizations and institutions may harness machine learning systems to confront human users with their own errors and teach them to improve their domain knowledge (Metcalfe 2017). To use machine learning applications to help humans widen their horizons of reasoning and understanding requires systems to explain their inherent reasoning in a human-understandable way that addresses the pitfalls of human learning processes.
Third, the black-box nature of machine learning applications can hamper their acceptance by users. This, in turn, likely impedes the integration of the application into existing processes. Naturally, reaping a technology's associated benefits presupposes its actual use that will not occur if systems' opacity inspires resistance and broad aversion. Especially in cases where the machine learning model's outputs contradict human experiences and intuitions, the provision of an interpretable explanation is of utmost importance to avert the emergence of tensions in human–machine collaboration and thereby resistance (Ribeiro et al. 2016).
Overcoming machine learning models' opacity and creating techniques that produce human-interpretable explanations whilst maintaining high predictive performance is not only a methodologically desirable objective. There are also immediate operational benefits from technological, social, economic, legal, and psychological perspectives. Specifically, model interpretability constitutes a binding constraint enabling (i) the optimization and debugging of models, (ii) the detection of inaccurate discriminatory patterns, (iii) the monitoring of continuous learning processes, (iv) the adoption of the technology by intended users, (v) accountability and responsibility, and (vi) users to harness models as teachers to enhance their knowledge and skills.
Considering that model interpretability is a key factor that will determine whether machine learning technologies can live up to their promise of unforeseen efficiency and welfare gains (Rahwan et al. 2019), it is not surprising that policymakers have caught on to this issue as well. With the General Data Protection Regulation (GDPR) that has taken effect in 2018, the European Union effectively provides people with a right to obtain an explanation about when and why an algorithm produced a specific, personally consequential decisions (Parliament and Council of the European Union 2016, Sect. 2, Art. 13–15, Sect. 4, Art. 21, 22; Goodman and Flaxman 2017). With the fast integration of ever-more complex machine learning applications into business processes, regulators will almost certainly introduce additional measures with which they intend to maintain legal oversight over algorithmic systems. As the (automatic) provision of human-readable explanations for algorithmic outputs arguably constitutes a natural angle to do so, the study and examination of interpretable machine learning using scientific tools are important from an operational compliance perspective as well.