Motivation and research needs

Artificial Intelligence (AI) has potentially far-reaching applications that can influence people’s private and professional lives (Meske et al., 2022). These include the identification of diseases (AignosticsFootnote 1; Meske et al., 2022), job recruitment (iVCVFootnote 2; Sipior et al., 2021), public security (Intelligent ArtifactFootnote 3), and risk assessment when granting loans (Wang et al., 2019; ZEST AIFootnote 4). The models used in these instances are often highly complex black boxes (Adadi & Berrada, 2018), meaning that the ability to understand the models’ underlying AI processes—and thus the reasons for their decisions—is severely limited. This is problematic because the comprehensibility, explainability, and justification of decisions are of great importance for many applications, including in the health, finance, and energy sectors (Meske et al., 2022).

Although AI is already used for a wide range of activities and provides various benefits, many decision-makers such as managers and executive board members, remain reluctant to integrate AI technologies caused by a limited understanding (Barredo Arrieta et al., 2020). This issue can be addressed by Explainable Artificial Intelligence (XAI) methods, which emphasize the need to make complex models and algorithms understandable and reproducible to humans (Meske et al., 2022). According to Gilpin et al. (2018), the term “explainability” refers to “models that are able to summarize the reasons for neural network behavior, gain the trust of users, or produce insights about the causes of their decisions” (p.80). Moreover, it is not sufficient to gain the trust and understanding of model users; the trust and understanding of other relevant stakeholders such as managers, regulators, AI developers, and users or people affected by model decisions are also necessary (Barredo Arrieta et al., 2020).

XAI research includes feature engineering (Wambsganss et al., 2021), algorithmic development and testing (Förster et al., 2021; Xie et al., 2022; Zschech et al., 2022), risks and opportunities (Meske et al., 2022), principles for ethical AI utilization (HLEG-AI, 2019; Seppälä et al., 2021; Thiebes et al., 2021), and the adoption, trust, and usage behavior of XAI (Hamm et al., 2021; Hemmer et al., 2022; Lockey et al., 2021; Stroppiana Tabankov & Möhlmannn, 2021). XAI services, both from startups and from established companies like Google, increasingly appear in electronic marketplaces. They offer value creation through applications such as image annotation for healthcare or botanical purposes (ZegamiFootnote 5), fraud detection for cybersecurity (FiddlerFootnote 6), or decision support for financial investments (Google CloudFootnote 7; ZEST AIFootnote 8). In such applications, dashboards, what-if scenarios, and no-code models propose to provide explainability and justified decision-making (DataRobotFootnote 9; Google Cloud7; LagoonFootnote 10).

The worldwide revenue of the XAI market was valued at 4.4 billion U.S. dollars in 2021, and it is forecasted to grow by 2030 to a volume of 21 billion U.S. dollars (Statista, 2022). These XAI services vary widely in terms of target group, purpose, utilized model, and degree of explainability. While several approaches to classify the various design options can be found in literature (e.g., Adadi & Berrada, 2018; Barredo Arrieta et al., 2020), there is a lack of literature-based overviews of existing design options for XAI models in connection with real-world, commercially available XAI services. Thus, we address the following research question (RQ):

  • RQ1: Which XAI design options can be extracted from the literature using a morphological analysis?

To address RQ1, we perform a morphological analysis following Ritchey (2011) and Zwicky (1967) to conceptualize the existing approaches in the literature. We create a morphological box (MBox) as the result of the morphological analysis and structure all the design options of XAI services according to their dimensions and characteristics (Ritchey, 2011).

Haag et al. (2022) observed that many companies are unable to exploit the full potential of AI methods for corporate processes due to the lack of knowledge about AI methods, their application areas, and their possible benefits. Therefore, stakeholders need decision support to identify suitable design options, use cases, and business models; such support, especially when employing XAI solutions and services, reduces AI’s entrance threshold. Real-world XAI services are provided by companies specialized in data science. These companies offer commercially available complete XAI solutions or XAI cloud platforms, such as Dataiku,Footnote 11 DataRobot9, ZEST AI8, and this leads us to our second RQ:

  • RQ2: Which archetypical business models can be deduced from classifying real-world XAI services, and how can XAI stakeholders be supported in selecting suitable XAI services for their requirements?

To address RQ2, we apply the conceptual MBox and classify 40 real-world XAI services into the dimensions and characteristics of our MBox and deduce archetypical XAI business models. This allows us to compare literature and practice and to develop a decision support framework. The latter takes the form of a decision tree for decision-makers and other relevant stakeholders in companies and organizations, thus allowing them to integrate XAI solutions and services into their corporate processes.

RQ1 addresses the current understanding of XAI design options in the literature, and RQ2 focuses on the description and distribution of XAI design options in industrial applications, here in the context of XAI real-world services. A comparison of the literature-based morphological analysis and the archetype analysis of real-world XAI services could reveal possible similarities and differences concerning XAI design options in theory and practice. We thus address a third RQ:

  • RQ3: What are the differences between XAI design options in theory and practice, and what impact might these differences have?

This paper is organized as follows. First, we describe the theoretical background and our research design. Based on this, we present the morphological analysis and assess to what extent it matches real-world XAI applications. Then, we perform a cluster analysis and deduce seven archetypical business models. Finally, we develop a decision support framework as a decision tree; discuss our results, findings, and limitations; and derive recommendations for further research and practice.

Theoretical background

According to Kaplan and Haenlein (2019), AI is a “system’s ability to correctly interpret external data, learn from such data, and use those learnings to achieve specific goals and tasks through flexible adaptation” (p. 15). The use of AI is expected to grow rapidly in the coming years; according to forecasts, the market for AI software will reach a global revenue of $126 billion by 2025 (Omdia, 2021). AI influences both private lives and entire business models affecting decision-making in areas such as the healthcare, finance, and energy sectors (Haenlein & Kaplan, 2019). Tasks performed by AI can either complement or replace human work (Meske et al., 2022). AI often generates predictions and provides recommendations based on large amounts of data (Kibria et al., 2018). However, the diversity of AI’s potential applications raises the question of which decisions should be made by AI models and which should not (Haenlein & Kaplan, 2019). The question is difficult because AI can result in a range of benefits, challenges, and risks, all of which must be weighed against each other (Meske et al., 2022). Due to computers’ rapidly increasing processing capacities, high-performance AI systems are possible, today. Indeed, in some cases such as breast cancer detection, the performance of AI exceeds that of humans, underscoring its utility (McKinney et al., 2020). This is especially true for Machine Learning (ML) approaches that use Artificial Neural Networks (ANN). However, AI models of Deep Learning like ANNs are complex and increasingly opaque. Such models are referred to as “black-box models” (Barredo Arrieta et al., 2020). The term refers to models for which it is difficult to understand the internal training and operation of the algorithm, making it a challenge to interpret how the algorithm’s outputs are obtained (Adadi & Berrada, 2018).

A potential risk of AI usage is the bias, it can introduce in various forms, including automation, discrimination, and statistical bias (Meske et al., 2022). Automation bias refers to the tendency to over-rely on decisions made by a computer system even when personal decisions are more correct (Goddard et al., 2012; Meske et al., 2022). For example, medical doctors may make decisions based on AI results even though they would have made a different diagnosis without AI. This type of bias arises because humans often tend to accept the recommendations of decision support systems without critically questioning them (Goddard et al., 2012). Meanwhile, discrimination bias can involve, for example, racial or gender discrimination because human bias is present in training data (e.g., text and web corpus; Caliskan et al., 2017; Meske et al., 2022). Finally, statistical bias is the potential distortion between results calculated using historical data and actual data (Meske et al., 2022).

XAI addresses these challenges and risks (Adadi & Berrada, 2018). The goal of addressing the lack of trust and transparency associated with AI models is a major contributor to the emergence of XAI (Adadi & Berrada, 2018; Gilpin et al., 2018; Lipton, 2018). According to Barredo Arrieta et al. (2020), the drivers of XAI depend on the individual stakeholders. For example, users of a model, such as physicians and insurance agents, want to trust an AI model and gain scientific knowledge; regulators want to certify an AI model’s compliance with applicable legislation; managers want to evaluate regulatory compliance and understand enterprise AI applications; data scientists, developers, and product owners want to improve product efficiency and develop new functionalities; and people affected by AI model decisions want to understand their situation and verify the fairness of decisions (Barredo Arrieta et al., 2020).

To create explainability, the two goals of interpretability and completeness are addressed. However, it is challenging to simultaneously achieve both goals (Gilpin et al., 2018). Completeness refers to the system’s accuracy (i.e., the accuracy of the model; Gilpin et al., 2018), while interpretability describes whether the reasons behind the decision are directly understandable to humans without further explanation (Doshi-Velez & Kim, 2017; Gilpin et al., 2018; Guidotti et al., 2019). However, according to Lipton (2018), “interpretability does not reference a monolithic concept” (p. 42). This means that different AI approaches (e.g., linear regression, Bayesian models, support vector machines, or multi-layer neuronal networks) achieve different levels of explainability through either transparent AI models, which are explainable by design models, or post-hoc models, that provide explainable information on the already developed model (Barredo Arrieta et al., 2020). While post-hoc models often fail to provide insights into exactly how a model works, they may nonetheless provide valuable information for practitioners and users of ML (Lipton, 2018). According to Gilpin et al. (2018), “given the purpose and type of explanation, it is not obvious what the best type of explanation metric is and should be” (p. 88).

There are many ways to achieve XAI, each of which provides different levels and understandings of interpretability (Doshi-Velez & Kim, 2017; Gilpin et al., 2018; Kim, 2018; Lipton, 2018). Adadi and Berrada (2018), Gilpin et al. (2018), and Guidotti et al. (2019) illustrated the variety of XAI techniques, including model distillation, layer-wise relevance propagation, surrogate models, and feature importance. They also discussed these techniques associated with the global or local scope of interpretability and their post-hoc or by-design explanations. Our research shows XAI models’ design options and their explainability targets, and we also demonstrate the relative prevalence of these design options in the real world.

Research design and research methods

The research questions address a complex problem concerning decision support for interested stakeholders in XAI design, development, and application. Our research design is structured into three phases: morphological analysis, classification and clustering, and decision support framework development. We present our research procedure in Table 1, then describe it step by step in the following section.

Table 1 Research design and research methods

Phase 1

In the first phase, we performed a morphological analysis to identify design options for XAI applications. This analysis allowed us to structure and conceptualize our research topic within the literature, reducing the complexity of the multidimensional problem and identifying the interplay between dimensions and characteristics (Ritchey, 2011). This builds the theoretical foundation for the archetype identification and the development of the decision support framework, but it can also contribute to literature and practice on its own.

The first step of morphological analysis is to search and review the relevant literature to identify the dimensions and their specific characteristics. In the context of MBox development, it is mandatory that only one characteristic in each dimension be selected (Ritchey, 2011).

We conducted a systematic literature review in line with Templier and Paré (2015), Watson and Webster (2020), Webster and Watson (2002), and vom Brocke et al. (2015). We browsed the academic databases IEEEXplore, AISeLibary, Science Direct, Springer Link, and ACM Digital Library and searched for articles containing the following keywords in the title or abstract: “XAI” OR “explainable AI” OR “explainable artificial intelligence” AND “taxonomy” OR “framework” OR “components” OR “design” OR “design options” AND “business” OR “business model” OR “service” AND “methods” OR “system” OR “model.” Articles had to be peer-reviewed and published between 2017 and 2022.

The keyword-based database search identified 203 scientific publications. After our screening, we excluded all publications not focusing on XAI design options or frameworks. Twelve papers remained after these exclusions. Furthermore, we performed a backward, forward, author, and Google Scholar similarity search with the most important articles in the keyword-based literature search (the most important are, e.g., Adadi & Berrada, 2018; and Barredo Arrieta et al., 2020). After full-text screening, we reached saturation in the search process of scientific publications because no significant novel XAI design options were found. We stopped when we identified ten additional publications. Thus, we included 22 scientific publications for the second step of creating our MBox. All 22 publications can be found in the MBox as references.

Phase 2

To evaluate the theoretical and literature-based MBox, we classified 40 real-world XAI services within the MBox’s dimensions and characteristics. This also allowed us to create a data set for our archetype analysis and for our development of the decision support framework. Our data set consists of the 40 XAI services on the vertical axis, while the horizontal axis defines the characteristics and their corresponding dimensions from the MBox. Each XAI service is then checked to determine which characteristics match each dimension. Only one characteristic can be selected for each dimension, see online Appendix A (https://osf.io/b8r7j/?view_only=2a5e19822eb34b618a1ee219936576a7).

To find the real-world XAI services, we used the database crunchbase.com, which provides business and corporate information related to technology companies (Weking et al., 2020), and the search engine Google. We searched for the following keywords: “XAI” OR “explainable AI” OR “explainable artificial intelligence” AND “services” OR “applications” OR “solutions” OR “companies” OR “startups.” This search identified 78 companies offering XAI services in various disciplines. Due to insufficient information on the companies’ websites, we excluded 38 companies. Finally, we classified 40 XAI companies according to the dimensions and characteristics of the MBox and constructed a vector for each examined object along the dimensions.

In Step 4, we conducted a cluster analysis, which allowed us to discover a structure, identify patterns in the data set, and group the classified real-world XAI services, see online Appendix A (https://osf.io/b8r7j/?view_only=2a5e19822eb34b618a1ee219936576a7). According to Kundisch et al. (2021), cluster analysis can be conducted as an evaluation of the MBox with the goal of “better describing, identifying, classifying, analyzing, and clustering objects that represent a certain phenomenon compared to doing so without a taxonomy or other classification schemes” (p. 9). Similar XAI services with similar classified characteristics according to our MBox are grouped into one cluster (Kaufman & Rousseeuw, 1990). We applied the k-means algorithm to cluster the data set with a predefined number of clusters (Kaufman & Rousseeuw, 1990). The k-means algorithm (see online Appendix B (https://osf.io/b8r7j/?view_only=2a5e19822eb34b618a1ee219936576a7)) is an established partitional-based clustering method which has the advantage that it clusters the data set based on its centroid and distance in a simple way. However, the number of centroids, which means the number of clusters, must be a priori-determined (Saputra et al., 2020). To find the optimal number of clusters we used the elbow and silhouette methods (Punj & Stewart, 1983; Rousseeuw, 1987). These methods allowed us to determine how close the data is with others within a cluster and how far away one cluster is from the others (Saputra et al., 2020). Based on the clustering results, we derived our archetypical patterns of XAI business models, which involves identifying the similarities among the focuses of our archetypes.

Phase 3

In the third phase, we developed the decision support framework in the form of a decision tree. This framework provides decision support in selecting the most suitable XAI business model for relevant stakeholders. A decision tree is a helpful tool for decision-making in relation to the previously identified XAI business models and archetypes. It is easy to understand and easy to use. Due to the clear structure of the tree with its root nodes, the tree offers decision rules and clearly indicates dependencies (Kamiński et al., 2018). The decision tree can serve as a support framework for decision-makers such as managers, product owners, and data scientists purchasing or programming novel XAI products. The multitude of options to integrate explainability into AI models can be overwhelming, and it is a major responsibility for many decision-makers. Different XAI model requirements can be queried using our decision tree, and based on these answers, a recommendation will be made regarding which XAI business model and archetype should be selected.

We implemented the decision tree algorithm using the Python-based ML toolbox from sci-kit learn (Pedregosa et al., 2011). The archetypes are the recommendations of the decision tree, while the selected characteristics are the respective data features that the model obtains for training. This creates individual vectors of the 40 XAI services (see online Appendix C (https://osf.io/b8r7j/?view_only=2a5e19822eb34b618a1ee219936576a7)). The archetypes are the output that the decision tree tries to predict. The decision tree algorithm (see online Appendix G (https://osf.io/b8r7j/?view_only=2a5e19822eb34b618a1ee219936576a7)) produces binary questions, such as “Will the explainability be integrated into the AI model post-hoc or not?” We manually transformed the answer “or not” to other possible answers extracted from the MBox and the cluster analysis results. To make the decision tree more useful, we added archetype-specific design recommendations that XAI developers can follow to select the best-suited services for their requirements.

Literature review and morphological analysis

Based on the systematic literature review, we identified four publications that classify XAI models in general: Adadi and Berrada (2018), Barredo Arrieta et al. (2020), Li et al., (2020), and Mohseni et al. (2021). These papers served as a basic framework for the development of the MBox and were supplemented by additional topic-specific papers. This allowed us to classify the MBox into three layers: objectives, classification of XAI methods, and XAI methods, see Table 2. The objective layer addresses the target of integrating XAI. This includes the motivation for XAI (e.g., Adadi & Berrada, 2018; Meske et al., 2022) and the goals of several types of XAI users, including AI novices, data experts, and AI experts (Mohseni et al., 2021). Both the classification of XAI methods layer and the XAI methods layer originate from Adadi and Berrada (2018), as they group XAI strategies into classification of the XAI methods according to the complexity, scope, and dependency level of the AI model. XAI methods concern concrete XAI techniques, such as visualization and example-based explanations (Adadi & Berrada, 2018).

Table 2 Morphological box

Objectives

Explainability can be incorporated into advanced AI solutions for various reasons (D1). Adadi and Berrada (2018) classified these reasons as follows: explain to justify (C1,1), which entails a fair decision-making process; explain to control (C1,2), which develops a better understanding of the algorithm; explain to improve (C1,3) the algorithm; and explain to discover (C1,4), which examines the relationships between data.

Once a company has decided to develop an XAI application for one or more of the motivations (multiple C1,5), various goals for different user groups can be pursued during its implementation: AI novice goals (D2), data expert goals (D3), and AI expert goals (D4). AI novices refer to end-users who apply AI technologies in their everyday lives but have a limited understanding of their underlying systems (Mohseni et al., 2021). For AI novices, the goals of algorithmic transparency (C2,1), trust and reliance (C2,2), bias mitigation (C2,3), and privacy awareness (C2,4) may be relevant to XAI development (Carvalho et al., 2019; Gerlings et al., 2021; Mohseni et al., 2021). Data experts are data scientists or domain experts who use AI to gain insights from data. Though they have a particularly good understanding of their application area, they are unfamiliar with the technical processes required to make AI work. Data experts may be particularly interested in visualizing and inspecting the models (C3,1) and tuning and selecting (C3,2) models for specific problems. AI experts, by contrast, are responsible for developing, implementing, and continuously improving AI algorithms and explainability techniques. Model interpretability (C4,1) is an important criterion for AI experts because it helps them understand the AI’s processes for learning from data in general and making decisions in specific contexts. In addition, they use explainability techniques to improve the model and the underlying training process. The model debugging (C4,2) characteristic captures these goals. In conclusion, companies will pursue different goals depending on the target group for an XAI application (Mohseni et al., 2021).

Classification of XAI methods

XAI methods can be classified into four dimensions: complexity-related methods (D5; Adadi & Berrada, 2018), model-related methods (D6; Adadi & Berrada, 2018; Markus et al., 2021; Rai 2019), scope-related methods (D7;Adadi & Berrada, 2018 ; Guidotti et al., 2019 ; Ivaturi et al., 2021 ; Setzu et al., 2021), and input data types (D8; Li et al., 2020; Linardatos et al., 2021).

The first dimension, complexity-related methods, can be divided into post-hoc (C5,1) and by design (C5,2) explanations (Adadi & Berrada, 2018; Alamri & Alharbi, 2021; Mohseni et al., 2021). The former occurs in addition to the black-box model, while by design explanations occur during the model’s training phase (Alamri & Alharbi, 2021). Following Adadi and Berrada (2018), “the complexity of a machine-learning model is directly related to its interpretability” (p. 52147). According to them, more complex methods provide less interpretability, and simpler methods are more interpretable. However, there is an ongoing debate concerning the relationship between model complexity and accuracy in literature. To analyze this relationship, Koziol and Weitz (2021) examined various pricing models and input data types (e.g., historical data, solvency data, and product data). They found that under normal circumstances, in their case, a normal market environment, increased model complexity does not necessarily improve its output accuracy and that input data can also play a central role (Koziol & Weitz, 2021). In an earlier evaluation of the complexity and the accuracy of different forecasting models, Ahlburg (1995) concluded that “it is too early to say whether simple models are more accurate than complex models or whether causal models are more accurate than noncausal models” (p. 287); this debate continues today.

The model-related methods can be distinguished into model-specific (C6,1) and model-agnostic (C6,2) interpretability techniques. Model-specific techniques can only be applied to a certain class of models or algorithms, while model-agnostic techniques can be used for any algorithm type (Adadi & Berrada, 2018; Markus et al., 2021; Rai 2019). Moreover, model-specific techniques consider only certain model types when specific types of explanation are required. The disadvantage of these techniques is that selecting a model that provides a certain type of explanation often reduces the model’s representativeness (Adadi & Berrada, 2018). According to Adadi and Berrada (2018), “model-agnostic interpretability techniques are convenient, they often rely on surrogate models or other approximations that can degrade the accuracy of the explanations they provide” (p. 52151). This is not the case for model-specific interpretations since they refer to a specific model (Adadi & Berrada, 2018).

There are two variations of the scope of interpretability: global (C7,1) and local (C7,2; Adadi & Berrada, 2018). Local interpretability means that only one specific decision can be explained. In contrast, global interpretability refers to understanding the entire system and the connection between input and output variables so that every decision is comprehensible (Adadi & Berrada, 2018; Guidotti et al., 2019; Ivaturi et al., 2021; Setzu et al., 2021). Though global interpretability is useful, it is difficult to implement. Conversely, local interpretability is easier to achieve and is commonly used (Adadi & Berrada, 2018).

Determining which method of explainability should be used also depends on the available input data type. Some models can be applied to data in tabular (C8,1), image (C8,2), text (C8,3), or graphical (C8,4) form (Li et al., 2020; Linardatos et al., 2021). To include the option of choosing more than one data type from (C8,1 to C8,4), the characteristic multiple (C8,5) can be selected either.

XAI methods

XAI methods can be classified into five dimensions: explanation by influence (D9), visual explanations (D10), explanation by simplification (D11), example-based explanations (D12), and text explanations (D13).

The first dimension is explanation by influence methods, which are applied to analyze the relevance or importance of a certain model feature to prediction performance (Adadi & Berrada, 2018; Barredo Arrieta et al., 2020). In the MBox, we identified three characteristics within this dimension: sensitivity analysis (C9,1), layer-wise relevance propagation (LRP; C9,2), and feature importance (C9,3). Sensitivity analysis aims to determine the influence of input or weight perturbations on the output (Ruck et al., 1990); measures the usefulness of input features, and identifies which feature has the most significant impact on the prediction (Kridel et al., 2020). The second characteristic is LRP (Bach et al., 2015), which includes different layers, such as the input, hidden, and output layers of ANNs. Starting at the output layer, a relevance value is calculated for every neuron in each layer depending on the weights, the activation, and the relevance value of the neuron of a deeper layer. In this way, a relevance value can be determined backward up to each network neuron’s input layer (Bach et al., 2015); thus, the LRP “identifies pivotal properties for the prediction” (Adadi & Berrada, 2018, p. 52150). Finally, feature importance methods can provide either local or global explanations. One approach for global explanation is random trees (Breiman, 2001). Local explanations can be provided by Shapley additive explanation (SHAP), which measures each feature’s contribution to the prediction (Lundberg & Lee, 2017).

Visual explanation aims to illustrate an AI model’s behavior by analyzing the interactions of input features; it is often applied with other techniques to improve users’ understanding of the model (Barredo Arrieta et al., 2020). The literature distinguishes between partial dependence plot (PDP; C10,1), individual conditional expectation (ICE; C10,2), and feature relevance visualization (C10,3; Adadi & Berrada, 2018). PDPs visualize the average partial relationship between input variables and the predicted outcome of post-hoc interpretable AI algorithms. They can be classified as a model-agnostic XAI method that can achieve either local or global interpretability. In this context, the influence of one or several features on the prediction can be analyzed (Adadi & Berrada, 2018; Hakkoum et al., 2021). The second type of visual explanation is ICE, a model-agnostic method that enables local interpretability. While PDPs use the average effect of the feature on the prediction, ICE plots disaggregate the PDP and focus on specific instances (Adadi & Berrada, 2018). The selected features are modified (perturbed) in an iterative process while all other features remain unchanged (Li et al., 2020). If there are any interactions, examining average effects can lead to an erroneous estimation of complexity in heterogenous predicted outcomes (Curia, 2021). The third characteristic is feature importance, which aggregates several methods to visualize the relevance of specific features of an AI algorithm (Adadi & Berrada, 2018).

The third dimension of the XAI methods layer is explanation by simplification (D11). This includes all XAI concepts that develop completely new, explainable models based on the trained AI model. The objective is to achieve a less complex model while maintaining the same prediction accuracy (Barredo Arrieta et al., 2020). We identified three such methods in the literature search: rule extraction (C11,1), model distillation (C11,2), and surrogate models (C11,3).

In rule extraction, the knowledge the ANN gains through training is made explainable by extracting rules that approximate the ANN’s decision-making path using input and output data (Adadi & Berrada, 2018; Li et al., 2020). The second method is model distillation, which can be classified as a type of model compression. If it is applied to a deep neuronal network, a deep network called teacher is trained with a large data set. If this model performs accurately, its knowledge can be transferred to a less complex model called the student. The technique aims to find a student who mimics the teacher, leading to a better understanding of the complex model while maintaining prediction accuracy (Adadi & Berrada, 2018). The next characteristic is the surrogate model, which is model-agnostic with either local or global interpretability (Hakkoum et al., 2021). In general, surrogate representations are approximations of the actual AI models. These approximated models are much simpler, reducing the complexity of the AI algorithm. To achieve an output that is as accurate as possible, surrogate representations are trained on the predictions of the black-box model using methods such as linear regressions. This improves interpretability but can harm prediction performance (Adadi & Berrada, 2018).

The fourth dimension identified is example-based explanation methods (D12). These methods are useful if the distribution of the training data set is complex and difficult to understand (Li et al., 2020). In this dimension, a distinction between prototypes and criticisms (C12,1) and counterfactual explanations (C12,2) is made (Adadi & Berrada, 2018). In this context, “a prototype is a representative data instance from the original data set” and “a criticism is a data instance that is not well represented by the set of prototypes” (Li et al., 2020, p. 8). This method can provide insights into the distribution of the original data set, and criticisms are determined by maximizing the difference in the distribution between the data set and the prototype (Li et al., 2020). Finally, counterfactual explanations seek “to find the smallest change of the feature value so that it can change the prediction into the desired outcome” (Li et al., 2020, p. 8).

The last identified dimension within the layer of XAI methods is text explanation (D13). Text explanations provide natural language generated through a learning process that explains an AI model’s results. Thus, text explanation cannot be seen as a standalone explanation method. Instead, other techniques provide numbers or visualizations as input to the text explanation model, which outputs natural language explanations (Bennetot et al., 2019).

Classification and cluster analysis

To address RQ2, we performed a cluster analysis to identify the archetypical patterns of XAI business models using our MBox and classification results of real-world XAI services. We classified the XAI services by visiting the website of every XAI service provider (see Appendix 1). These websites describe each XAI service and possible use cases. We only included XAI services that directly state that they offer XAI methods. We examined the 40 XAI services with the dimensions and characteristics of our MBox, ensuring that only one characteristic was selected per dimension. All authors simultaneously and independently classified the 40 XAI services to fulfill the four-eyes principle for validated results. In the case of a disagreement about a characteristic’s classification, the authors discussed the classification results.

The resulting data set from classifying the 40 XAI services can be found in the online Appendix A (https://osf.io/b8r7j/?view_only=2a5e19822eb34b618a1ee219936576a7). Based on this analysis, we imported the data set into RStudio and clustered it using the k-means algorithm. The RStudio algorithm can be found in online Appendix B (https://osf.io/b8r7j/?view_only=2a5e19822eb34b618a1ee219936576a7). This algorithm allows us to merge XAI services with the same characteristics into a cluster. Specifically, the data set consists of the 40 XAI services on the vertical axis, the MBox dimensions, and corresponding characteristics on the horizontal axis (see online Appendix A (https://osf.io/b8r7j/?view_only=2a5e19822eb34b618a1ee219936576a7)). For each XAI service, we created a row with zeros and ones. For each dimension and each XAI service, only one characteristic can be marked with a one, which denotes the special characteristic. The applied k-means algorithm grouped XAI services with similar marks for the same characteristics. Patterns of similarities and differences can be identified and incorporated into archetypal business models. However, the optimal number of clusters must be identified before clustering. For this, we followed the silhouette method (Punj & Stewart, 1983; Rousseeuw, 1987), which indicated that seven was the optimal number of clusters, see online Appendix D (https://osf.io/b8r7j/?view_only=2a5e19822eb34b618a1ee219936576a7). The graphical output of the elbow method indicated no clear result, see online Appendix E (https://osf.io/b8r7j/?view_only=2a5e19822eb34b618a1ee219936576a7). In addition, the plotted clusters indicated that the seven clusters are separated from each other and, at the same time, are not too small, in the sense that only one XAI service was contained in a cluster, see online Appendix F (https://osf.io/b8r7j/?view_only=2a5e19822eb34b618a1ee219936576a7).

In addition, we accepted the tradeoff created by this number of clusters between the level of detail and the number of services in each cluster. Interpreting them is difficult if too few clusters are determined because too many XAI services are merged. Nevertheless, a cluster must consist of more than one service. Table 3 visualizes the clustering analysis results and shows the percentage distribution of features in the seven archetypes and the column between the characteristics. The first cluster shows the percentage distribution in all examined XAI services. Each characteristic is color labeled, with 0% in white and 100% in dark gray. For example, dimension D5, explainability integration, is in Archetype 1 at 100% for the characteristic C5,1. For the cluster analysis, we deleted the characteristics C1,1, C2,1, and C8,4 because they did not appear in the real-world XAI services. All examined XAI services, and their assigned archetypes are listed in the online Appendix C (https://osf.io/b8r7j/?view_only=2a5e19822eb34b618a1ee219936576a7).

Table 3 Results of the cluster analysisa

Archetype 1—XAI to support decision-making

Archetype 1 consists of the three XAI services ZEST AI,Footnote 12 DreamQuark,Footnote 13 and Spin Analytics,Footnote 14 which offer solutions for the financial industry (see their websites or their crunchbase.com descriptions). The goal for AI novices is to increase trust and reliance on AI models, while the goal for AI experts is to interpret and debug the models. This archetype is characterized by a local scope of related explainability, resulting in only model-specific explainability. Furthermore, explainability is added post-hoc to already solved AI models. The method requires tabular data and produces counterfactual, example-based explanations. This business model is characterized by the use cases of credit and risk decisions (Spin Analytics), lending decision-making (ZEST AI), and asset management (DreamQuark).

Archetype 2—XAI to improve corporate metrics

Archetype 2 includes five XAI services—Cycorp,Footnote 15 Minerva Intelligence,Footnote 16 Stratyfy,Footnote 17 Cognino AI,Footnote 18 and Corpy & Co.Footnote 19—in the finance, manufacturing, and healthcare sectors. They target improvable workflows to minimize corporate costs and maximize profits. The XAI services in this archetype are model-agnostic and thus have a global scope of explainability. Layer-wise relevance propagation is the primary method used to explain relevant features; no visual or example-based explanations are provided. Rule extraction methods are used to simplify the model and its results.

Archetype 3—XAI for no-code models

Archetype 3 consists of nine XAI services (e.g., StrideFootnote 20 and Akkai KaeruFootnote 21) that offer solutions for the finance and healthcare sectors. This archetype is characterized by providing no- or low-code models to simplify AI models and increase their usability. Explainability is integrated post-hoc for specific models on a local level, and no explanations by simplification or example-based explanations are provided.

Archetype 4—XAI for transparent and trustworthy AI

Archetype 4 consists of six XAI services that strengthen transparency and trust in AI for corporate decision support. The services are not model-agnostic; they add explainability post-hoc, and only specific models are explained. Multiple input data are used, such as videos and photos (iVCVFootnote 22) and financial data (DydonAIFootnote 23). For use cases such as underwriting (xcoringFootnote 24), recruiting (iVCV), and investment banking (SCALNYXFootnote 25), XAI can support strategic decision-making or recruiting by providing transparent results and causal justifications in a less biased way.

Archetype 5—XAI to leverage data

Archetype 5 consists of seven XAI services (e.g., Zegami,Footnote 26 RISHI-XAI,Footnote 27 and HMXFootnote 28) to improve the value creation of utilized data such as images, videos, and corporate data. XAI services in this business model are explainable by design in a model-agnostic way. To explain the influencing features of the AI model, this business model uses layer-wise relevance propagation. Corporations’ daily core activities (e.g., project management) can be carried out more efficiently by extracting insights from a large volume of historical data to accumulate useful knowledge about future results (RISHI-XAI).

Archetype 6—XAI to democratize data science

Archetype 6 includes eight XAI services (e.g., Fiddler,Footnote 29 Dataiku,Footnote 30 Beyond LimitsFootnote 31) that comprehensively democratize XAI models’ results. This archetype is collective in its specifications because it utilizes multiple characteristics. For this reason, we reexamined all eight XAI services in this archetype in detail by using their websites and crunchbase.com descriptions and discovered that the archetype’s focus is on the input data and its impact on the AI model. Multiple visualization methods such as dashboards, reports, and what-if-scenarios, e.g., Fiddler, help reduce the time for error correction, improve efficiency and accuracy, and encourage trust in AI technologies and adoption. Use cases include detecting damages on solar cells (HACARUSFootnote 32) and credit risk assessment for lenders or stock selection (Beyond Limits).

Archetype 7—XAI to uncover new insights

This is the smallest archetype. It features only two XAI services (AignosticsFootnote 33 and clearbox.aiFootnote 34) and aims to discover data’s potential for various purposes. Aignostics provides a diagnosis platform to discover biomarkers in biological images to identify evidence of diseases, while clearbox.ai generates synthetic data to improve data sets or anonymize sensitive data. Prototypes and criticisms are utilized in this business model to explain data features by employing examples, but no explanation by simplification is provided.

Discussion and a decision support framework

Researchers and practitioners are interested in XAI business models to help them explore data relationships, improve AI methods, justify AI decisions, and control XAI technologies while simultaneously meeting user needs (Adadi & Berrada, 2018; Meske et al., 2022; Thiebes et al., 2021). In contrast, many other scientists have focused on XAI algorithms and proposed artifacts to increase unbiased AI decision-making (Xie et al., 2022) or to understand the behavior of an AI system (Polzer et al., 2022). To benefit from such solutions, users and interested stakeholders such as managers, data scientists, and AI developers must determine which XAI solution best fits their requirements.

According to Haag et al. (2022), though ML has been applied successfully in various contexts, its effectiveness remains constrained by firms’ limited knowledge of its possible uses. To address this limitation and RQ2, we developed a decision support framework to help stakeholders select XAI business models and design elements according to their needs for explainability and value creation. Our decision tree provides a market overview and clarifies the XAI selection process by asking binary questions whose answers lead to a specific archetypical business model. Since the archetypes have different purposes and methods, a particular business model can fit a certain decision-makers requirement better than others. In addition, the variety of XAI design options can be challenging and complex, so the decision tree helps to provide an initial overview. As described above in Phase 3 of the research method, we generated the decision tree based on the data set we created by classifying real-world XAI services within the literature-based MBox. Our classified dimensions and characteristics are the decision features, while the archetypes are the respective decision classes produced at the end of the decision tree. The final decision tree is illustrated in Fig. 1. Only questions are included that can be answered at the beginning of a planning phase for an XAI service, and a maximum of five questions are required to determine which business model and archetype to adopt.

Fig. 1
figure 1

Decision tree

To explain the decision tree, the left path will be described. Q1 asks at which point explainability should be integrated into the AI model and can be answered with post-hoc or by design explainability. This question divides the possible business models into two paths. QL2 asks whether XAI services should be integrated in a model-specific or model-agnostic way. If the model-agnostic approach is selected, Archetype 2 is recommended. If the model-specific approach is selected, the next question asks which goal should be pursued by XAI services for data experts. Either model visualization and inspection or model visualization and inspection and model tuning can be answered. If the second answer is given, Archetype 4 is recommended. If the first answer is given, the question regarding the motivation for XAI services follows. This question can be answered either by explaining to improve AI models or with multiple motivations. If the first answer is given, then Archetype 3 is recommended. If the second answer is given, then the question regarding the scope of XAI services in the AI model follows. If the scope is global, then Archetype 4 is recommended, while Archetype 1 is recommended for local scope.

Furthermore, we have noticed that the offer of XAI services is becoming increasingly diverse and that its market volume is predicted to grow significantly (Statista, 2022). From the MBox, we observe that the characteristics identified in the literature can be well classified using real-world XAI services and we do not add additional dimensions and characteristics. From this classification, we defined seven archetypes that can be deduced as business models, which were named according to their contribution to value creation. Hence, we noticed that the benefit accrues to either XAI application users, companies, or customers. “XAI to support decision-making,” “XAI for no-code models,” “XAI to leverage data,” and “XAI to democratize data science” provide the most benefit for users, and they can facilitate and accelerate AI system workflows. Furthermore, Archetypes 5 and 6 can extract more value from the AI models and data sets through techniques such as visualization. “XAI to improve corporate metrics” and “XAI to leverage data” provide the most benefit for companies using XAI services. New data-driven business opportunities can be accessed by leveraging data. Finally, “XAI for transparent and trustworthy AI” and “XAI to uncover new insights” benefit customers most. Customers or people affected by XAI models’ decisions can be sure that companies’ decisions are justified, for example, in the healthcare sector, patients can benefit from new diagnostic technologies to identify diseases.

With the help of our decision tree, we derived recommendations for decision-makers such as managers, developers, and data scientists interested in XAI solutions. The archetypes and design options recommended by the decision tree can help decision-makers identify which design options from the MBox should be considered for the particular explanation requirements. In addition, our study can help increase the acceptance and knowledge of regulatory authorities, users, and people affected by XAI models’ decisions.

To address RQ3, we mapped real-world XAI services to the MBox characteristics and identified differences in how often XAI methods are offered in practice (Table 3). The explanation by influence method (D9) is frequently used, as only 10% of the 40 XAI services do not use any methods listed in D9. In particular, sensitivity analysis (C9,1) and LRP (C9,2) are often applied (C9,1: 25%, C9,2: 35%). The method of rule extraction (C11,1) from the dimension of explanation by simplification is also frequently used (40%), as are text explanations (C13,1; 95%).

In contrast, example-based explanations are used less frequently, including prototypes and criticisms (C12,1: 5%) and counterfactual explanations (C12,2: 10%). In total, 67.5% of the 40 XAI services do not use these methods. Moreover, the methods of explanation by simplification are not used by (C11,4: 37.5%. These include, in particular, model distillation (C11,2: 10%) and the surrogate model (C11,3: 12.5%), but rule extraction is used more frequently (C11,1: 40%).

The methods used in real-world XAI services often offer visualization and graphical representation. These show the influence of the input data changes on the output and, thus, the AI prediction. Meanwhile, methods used less frequently show interpretations of the models or explain their behavior. According to Barocas et al. (2020) and Crupi et al. (2021), example-based explanations are not sufficient to develop feasible measures that a user can apply. This is consistent with the few use cases of real-world XAI services that apply the methods. Archetype 7 is the only archetype to use the methods of prototypes and criticisms; one use case in this archetype is the uncovering of biomarkers for pathologists. The goal here is to uncover patterns that would be difficult to identify visually. The user does not need any further instructions after the event of a discovery. This suggests that XAI services that provide decision support with instructions for action are offered especially frequently; such methods include the modification of input data (e.g., explanation by influence method) as employed by Adadi and Berrada (2018). Services that explain the output of the AI model are offered less often; such models can only show that something should be changed but do not indicate how. A mixture of both approaches could achieve the best balance between service levels in terms of explanation and decision support. However, if the focus is on decision support, this may raise the risk of losing the explanation of how an AI model works. The acceptance of the regulatory authorities, users, and customers who are affected by AI results can decrease as a result. Therefore, to build and stabilize acceptance, it is important to pursue both explanation and decision support.

By examining real-world XAI services, we were able to determine that the group of private persons or end-consumers was not targeted except by the DataRobot service.Footnote 35 In the case of DataRobot, decisions in the consumer domain can be enriched with explanations, but this is only because of the tool’s universal applicability and is not explicitly described in a use case. The reasons why DataRobot considers explainability to be particularly relevant to this domain are not provided. This may be related to the fact that the optimization technologies would not be used by companies with sufficient funds to afford them but rather by end-users for whom such an investment would not be profitable.

Haag et al. (2022) showed that many companies are unable to exploit the potential of AI models. Our decision tree helps to provide an initial orientation. However, it is important to efficiently balance the use of AI applications, as not all corporate tasks require AI or XAI solutions and services.

XAI and ethical considerations must be regarded independently; understanding AI decision-making does not mean that the tasks performed are ethical. When analyzing features such as images and videos, it is important to consider whether these tasks are necessary, e.g., for human resource management or loan allocation. In this context, XAI models can only serve as explainable and justified support. Individual human decisions must always be included in such tasks, and human-centric needs, accountability, and decision-making must have high priority. In the health sector, potential applications that are difficult to achieve without AI solutions, such as recognizing biomarkers in computed tomography photos can be exploited.

Theoretical and practical contributions

We contribute to XAI theory by combining literature-based XAI design options with real-world XAI services and develop a decision support framework for academics and practitioners.

Our MBox enhances the understanding of how XAI models can be designed and helps stakeholders to determine which objectives are to be targeted. It also serves as a glossary for the XAI-related vocabulary.

Our research shows how to derive specific MBox and business model archetypes as well as a decision support framework. We use morphological analysis, cluster analysis, and a rule-mining algorithm. According to Osterwalder et al. (2005) and Weking et al. (2020), we build on the three levels of business models: business model elements (MBox), real-world instances (XAI services), and patterns (archetypes). Based on this, we developed a decision support framework to reduce the archetypes’ complexity and provided a simplified, strategic orientation in the domain of XAI models and their target functions.

Meanwhile, we offer a first market overview and decision support framework to help practitioners to identify the most important XAI design elements. For decision-makers such as managers and data scientists, the decision tree serves as a guide to which XAI design elements are necessary. The decision tree can be used to identify the most appropriate XAI business model and archetype. Based on this, decision-makers can refine their search in the XAI services purchasing process and filter targeted XAI methods or required input data. Even if managers want to program XAI in-house with AI developers, decision-makers can better target a project by narrowing down the development process. Furthermore, the decision tree provides an orientation about what requirements the programming should address and which developers should be engaged. In addition, decision-makers now know which design elements to incorporate and which to dispense.

For regulators and customers affected by the decision regarding an XAI model, our MBox, our archetypical business models, and our decision tree increases AI acceptance; familiarity with XAI design options and business models reduces uncertainty and fear of AI. Moreover, AI developers can use our research for initial guidance on which design options are important for their targeted tasks. XAI service providers can situate their services in the current market and conduct actions to innovate those services. AI service providers who want to expand their business model to XAI get an initial overview, explore the market for their market entry, and identify chances and challenges. For a market entry, the decision tree can be used to select a direction with an archetype and thus align the service with the central design options.

While our MBox provides a comprehensive and complex representation of literature-based XAI design options, our decision tree offers a simplified overview of the dependencies between the most important XAI design dimensions. The seven deduced archetypical business models can be used to benchmark XAI services. In addition, the MBox can be used to develop XAI models or services by selecting one characteristic per dimension to obtain an optimal solution combination iteratively. This can facilitate project work concerning XAI model development as the solution combinations define clear targets. Our research delivers a decision support framework for XAI users, companies, and other XAI stakeholders seeking to adapt and integrate XAI solutions. This is important since many companies have limited knowledge about the potential benefits of AI and XAI solutions in regard to their corporate needs. Therefore, the questions in our decision tree can be answered with little knowledge, reducing the entrance threshold for XAI. In addition, the boxes under the decision classes are recommendations for designers or providers of XAI solutions to consider the most important design elements.

Limitations and further research

One limitation of our study is the subjectivity of our literature review and the classification of the real-world XAI services. To mitigate this limitation, all authors independently reviewed the literature and classified the real-world XAI services. The low number of classified objects due to the limited availability of XAI services is a further constraint that we balanced by integrating XAI services from several industries and application areas. Furthermore, we are unaware of how many customers the XAI services have and to what extent they are satisfied with the services. In addition, it is possible that further XAI services may not fit into one of the identified archetypical business models. Nevertheless, our MBox, archetypical business models, and decision tree can be expanded in further research. Indeed, our decision support framework serves only as a first orientation for practitioners and researchers to reduce entrance thresholds and complexity.

Our MBox, cluster analysis results, archetypical business models, and decision tree provide an extendable basis that further research can build on both quantitatively and qualitatively. Primarily, a next research step can use focus group discussions to evaluate our archetypical business models and our decision tree with practitioners implementing XAI for corporate processes. Moreover, further research can extend the MBox and archetypical business models by investigating the relationships between the characteristics or developing a maturity model (Becker et al., 2009). For example, maturity levels from non-existent to optimized (Becker et al., 2009) can describe XAI models’ interpretability, input data types, XAI methods, and value creation. Our MBox offers the possibility to set a detailed research strategy, for example, by focusing exclusively on one dimension of the MBox or selecting one of the three layers. We encourage researchers to conduct more case studies on XAI models and real-world XAI services to further investigate their usefulness and applicability for particular business operations. In addition, in case studies, the surplus in knowledge, explanation, and justification, in contrast to black-box AI models, can be evaluated and discussed to determine whether this protentional surplus is valuable or whether not knowing certain details, such as not understanding the code for specific stakeholders or application areas.

A heat map of XAI research based on our MBox or archetypes can further contribute to theory. Here, a comprehensive literature review can be used to study less-explored but highly relevant research areas. Further research can use a matrix similar to Schoormann et al. (2021) to determine which research topics have been well explored and which research needs deserve more attention. Whether there is actually a need for research must be discussed with practical insights from our archetypal business models providing initial guidance. Taxonomies as proposed by Nickerson et al. (2013) for special XAI models or services (e.g., financial or medical XAI services) can provide a more detailed market overview for practitioners. It can also be useful to analyze which critical success factors influence XAI use across several sectors (Boynton & Zmud, 1984), identifying which challenges exist and how various real-world applications can be improved or adapted to expand XAI usage.

Conclusions

To address RQ1, which considers the literature-based morphological analysis, we identified 22 scientific publications and grouped them in a classification framework of XAI design options. We built our MBox containing three layers, 13 dimensions, and 51 characteristics. The MBox served as the basis for addressing RQ2 and RQ3, but it also made a theoretical and practical contribution on its own. RQ2 addresses the identification of archetypical business models and decision support for identifying a suitable archetype. We classified 40 real-world XAI services with a broad scope of various application areas and deduced our seven archetypical XAI business models by employing a cluster analysis. Based on our results, we developed a decision support framework in the form of our decision tree, which practically enables XAI stakeholders such as managers, data scientists, and AI developers to select a suitable business model to meet their requirements. To address RQ3, which focuses on the similarities and differences in research and practice, we compared the results of our cluster analysis with those of our MBox. We observed that XAI methods are often used in real-world services that offer recommendations through visualizations, e.g., changes in input data and their influence on the AI model, or graphical representations. Simplified AI models or numerical interpretations of the models that do not provide recommendations are less frequently used. To build and maintain the acceptance of regulatory authorities, users, and customers who are affected by AI, it is important to balance both goals: explanation and decision support. Our MBox, cluster analysis, and decision tree provide a theoretical and practical knowledge base for further theorization and applicable decision support for implementing XAI solutions in business processes and services Table 4.