Keywords

1 Introduction

The widespread adoption of artificial intelligence (AI) has had a profound impact on various sectors, including healthcare [4], finance [9], and many other fields [13]. As AI systems become more prevalent, the need for explainability and interpretability of their decision-making and decision-support processes becomes increasingly important. Explainable AI (XAI) aims to develop methods that make AI systems and their decisions understandable for human stakeholders [26], which requires fulfilling diverse expectations and requirements [36].

In this paper, we address the issue of explainability in the context of corporate AI applications, focusing on internal processes and practical applications [23]. We present a real-world case study where an AI system is employed to optimize an Industry 4.0 production process without considering XAI requirements. This case study enables us to identify situations where explainability is crucial and explore how business-oriented XAI requirements can be formulated and prepared for integration into process design [8, 10, 35].

Building on prior work in general models and non-functional software requirements, we provide valuable insights into the importance of explainable AI in smart manufacturing and formal business process development and into how corresponding requirements can be incorporated. This study is a resource for researchers and practitioners seeking better to understand the role of explainable AI in practice.

1.1 Related Work

That the lack of transparency, explainability, and understandability (or, more generally, perspicuity, as per [60]) in opaque AI models, known as black boxes [5], can lead to serious consequences is widely recognized [25, 26, 65]. In high-stakes contexts like healthcare and criminal justice, the consequences of using AI that lacks perspicuity become especially apparent, as decisions based on flawed or biased algorithms may harm individuals or communities on a large scale. Accordingly, XAI research dedicated to developing methods that make such systems understandable promises to help mastering various black-box-induced challenges [26, 27, 49, 62].

In recent years, various methods and approaches have been developed to address the explainability of AI models, making them more interpretable and understandable for human stakeholders (for recent reviews, see [3, 40, 59]). Techniques such as local surrogate models like LIME [16, 52] and SHAP [12, 41] have emerged as popular tools for model-agnostic post-hoc interpretability. Another approach is to employ counterfactual explanations [63, 64], which provide insights into AI decisions by presenting alternative scenarios where the outcome would have differed. Other techniques like TCAV or approaches exploiting the attention mechanisms are model-specific and try to look into the models’ inner processes [34, 47]. Alternatives such as [33] aim to explain models globally, i.e., to explain them (and their properties) as a whole rather than individual outputs.

XAI methods, however, are not an end in themselves but merely a means to an end. Thus, ever more and ever new XAI methods do not necessarily represent the solution to the identified black-box-induced challenges. From an application-oriented perspective, the usefulness of XAI methods hinges on the particular use context and in order to evaluate the suitability of XAI methods, one must understand the requirements, expectations, and needs of the relevant stakeholders that they are meant to fulfill [17, 35, 36, 51, 58, 61] Most considerations regarding XAI requirements take either a broad societal perspective [15, 19, 56] or the perspective of specific stakeholders [21, 29, 38] , especially in high-stake fields [20, 48, 50]. Typical examples of concrete expectations connected to XAI are the improvement of unfairness detection [7, 14, 30], the resolution of accountability questions, the filling of responsibility gaps [6, 44, 57], and the improvement of trustworthiness assessments [32, 42, 43, 55].

A long-standing consensus acknowledges the growing importance of explainability as AI systems become increasingly widespread. However, the significance of explainability is not restricted to societal contexts but extends to development and plays an instrumental role in AI-supported business processes, including manufacturing and production. Nevertheless, there remains a lack of concrete examples that practically substantiate these abstract considerations and clarify how to deal with the corresponding XAI requirements. More specifically despite the successful application of XAI techniques in various contexts, researchers still need to explore their significance in AI-driven production processes within companies sufficiently. Although international conferences have hosted workshops on the role of XAI in industry [22, 28], books have featured it [11], and experts consider it central to the advancement of Industry 4.0 [2], tangible, real-world case studies demonstrating how to address explainability requirements in practice remain a rarity.

2 The Case Study

In the following, we will use a concrete case to show which XAI requirements one should consider when designing and planning AI-supported production processes. We will begin by presenting a real example that demonstrates the achievement of the actual optimization goal while overlooking specific user requirements. We then use this case as the basis for an XAI-oriented analysis.

2.1 The Case

Initial Situation. The factory of a big car producer constitutes our use case. The manufacturing process of a car is complex. Before it can take place, several components must be produced and prepared for assembly. While the bare bodies and some components are produced separately by the manufacturer, additional parts are produced by external suppliers and delivered to the factory. Finally, the workers assemble all parts of the vehicle. In what follows, we will look at the last part of that process.

The assembly of passenger cars involves two steps (for an overview, see Fig. 1): In the first step, the workers prepare the base bodies. More specifically, they add doors and paint them. In the second step, the final assembly, all other parts are attached, from windows to the engine, transmission, axles, exhaust system and, finally, the interior.

During final assembly, the order in which workers assemble cars is crucial for the production process’s effectiveness. The challenge becomes more complex as conveyors automatically move the bodies from one workstation to another on a fixed cycle. If a car part is missing or the work at some station goes too slow, production has to stop. Concerning the latter issue, it is vital to be aware that the cars have different characteristics, which bring different difficulties in the assembly process. If complex, difficult to assemble cars with similar characteristics are repeatedly assembled one after the other, the entire production may come to a standstill.

To optimize the production order, the car factory has a line buffer connecting the two stages. This buffer holds up to 15 bodies per line in up to 27 lines. When utilizing the line buffer, decision-makers must determine the line on which to store a car arriving from the base body assembly and painting area and the line from which to retrieve the next car for final assembly.

The second decision, from which line to take the next car body for final assembly, is made entirely manually through a human supervisor. Employees with a lot of experience make said decisions by trying to algorithmically follow a number of rules based on the stored car’s features.

Fig. 1.
figure 1

The illustration depicts the on-site situation at the factory. The upper production line combines the bodies that are the result of the first assembly step, where the bare bodies are pre-assembled and painted. The production line at the bottom is leading to the final assembly. In between, buffering takes place in a line buffer that consists of 27 linear buffers with space for up to 15 painted bodies each. The AI agent described here makes recommendations for the outfeed, i.e., it recommends which painted bodies should go onto the conveyor belt of the final assembly line.

AI-Driven Solution. Employees can only apply and monitor the application of a limited number of relatively simple rules. Therefore, the approach, co-developed by some of the authors of this paper, aims at optimizing the second decision, i.e., which car body should go from the line buffer to final assembly, with the help of a learned AI agent. In addition to increasing efficiency through better and faster decision-making, the approach also aims to reduce the workload and cognitive load of the supervisors, who were not to be replaced but supported. To accomplish this, an AI was trained using reinforcement learning. Through self-play, without human supervision, this training was based on a simulation of the line buffer and the subsequent final assembly line. In order to obtain the appropriate rewards for reinforcement learning from the previously used rule-of-thumb-based method and these simulations, two approaches were combined:

  1. 1.

    Based on the existing rules of thumb, a large number of rules hierarchically ordered by importance are defined. We impose negative rewards on the AI agent, for rule violations, and assign more severe punishments to more critical rules. The AI agent was trained to minimize these punishments.

  2. 2.

    Based on a model of the final assembly line, one can simulate the production time, including possible events that would lead to a production stop. The AI agent was trained to maximize the number of cars produced, i.e., among other things, to minimize the time for which production has to be stopped.

Even early results of first experiments were impressive: in a toy example [24], the authors could show that the usage of an AI agent can significantly decrease the stopping time. Comparing historical data, i.e., results of real decisions about the order in which cars were produced, with the simulated results based on the AI suggestions showed a significantly increased plant productivity. Initial experience after deployment at the factory validated these experimental results – even though hurdles were encountered in practice.

Observation: A Lack of Adoption. In consultation with factory management and the works council, the system was never designed to make decisions fully automatically, i.e., independently. Instead, it was designed to make recommendations to a human overseer, i.e., a supervisor in the loop, who would then decide on their responsibility. However, when the system was deployed, it was observed that supervisors tended not to follow system recommendations, at least when they differed significantly from the decisions they would have made themselves.

Personal conversation in the field confirmed our suspicion that blindly following AI-based recommendations that do not match one’s own assessment is perceived as unsatisfactory and therefore tends to be rejected. Although this is only anecdotal evidence that should be substantiated by qualitative and quantitative studies in future work, it should nevertheless be allowed to be regarded as sufficient motivation for our present work. This is all the more true because links between acceptance and trust on the one hand and XAI on the other hand have also been postulated in other theoretical papers (see also related work above and our analysis below).

Taking a step back, it becomes clear: We have discovered a step in an existing production process that can be significantly optimized through AI and have developed a corresponding solution. What should have been initially considered in addition, however, is that using an AI system leads to new human-machine relationships and alters existing relationships between stakeholders. Only if these relationships and the associated interactions are considered, AI use can unfold its true potential.

3 Identifying XAI Requirements

Examples like the case described above show that more than merely incorporating AI applications into existing business processes is required to reach the desired goals. It is crucial to realize that the potential change in the overall dynamics and decision-making within the process when introducing AI applications is reason to rethink the entire process rather than simply adding AI components or replacing parts of the process with them. AI may offer new insights and perspectives that challenge or improve traditional business decision methods. However, this might necessitate reevaluating and restructuring the process to fully unlock AI’s capabilities. Thus, only true integration of AI, rather than merely the automation of tasks without considering broader implications and effects within the modified process, results in optimized outcomes. Most importantly, as we incorporate AI into our processes, it is crucial to consider the impact of this technology on communication and collaboration between humans and machines, as well as the overall effectiveness of the adapted process. For AI components to be effectively utilized, one must adapt the process to new requirements that emerge due to AI implementation in specific contexts, especially when human-computer collaboration becomes an integral part of the new overall system or process. Understanding these new requirements correctly or identifying them in the first place is much work. In particular, anticipating such requirements requires some experience and systematic approaches. However, if one runs into new challenges after already deploying an AI system (like in our case), it is crucial to analyze the observed deficiencies. Such situations are great opportunities to learn from mistakes and for future projects.

The present case lends itself to such an analysis, which is the main contribution of our work. We want to figure out how to design better and more efficient AI-driven business processes, especially concerning issues of adoption. Since we have no reason to believe that supervisors reject AI recommendations in principle, we hypothesize that the lack of adoption is due to a lack of information and understanding on the human side. This assumption is by no means intended to diagnose a lapse on the part of the human but to point to a flaw in the system. Apart from other transparency-related questions (cf. [60]), we assume a need for more explainability. As a result, the interaction between humans and machines, necessitated by the deployment of the AI components, introduces at least one new technical requirement: the need for AI to be explainable to the relevant human stakeholder.

Of course, this vague statement offers little concrete guidance concerning designing better and more efficient AI-driven business processes, specifically regarding adoption. Aiming for more transparency and more explainability means making concrete software engineering decisions. To tackle these challenges systematically, we build upon the groundwork laid by [36] (see also Fig. 2). The thus-gained understanding we then translate to concrete explainability requirements in the sense of [35]. By synthesizing the insights from both works, we aim to make steps toward a more comprehensive understanding of the challenges and opportunities presented by AI integration in business processes.

Fig. 2.
figure 2

The model of [36] that we use as the starting point of our analysis. We start by identifying the relevant stakeholders, which we consider the critical context factor. What the authors call “desiderata” we call “needs”, and we derive them from the stakeholders. Furthermore, we do not consider concrete explanatory approaches, i.e., concrete XAI methods, but consider what kind of explanatory information seems expedient to fulfill the identified needs of the relevant stakeholders.

According to [36], we first need to understand what de facto needs (or, as called there, “desiderata”) of the human stakeholders involved that were not met and how they connect to what XAI can deliver before we can then identify the suitable XAI methods. More specifically, their work investigates stakeholders’ desiderata (needs) in the context of explainability approaches. They start from the observation that much of XAI literature claims that certain XAI methods help meet specific needs in an interdisciplinary way. The authors emphasize the lack of clarity about how exactly explanatory approaches can help satisfy the cited stakeholder needs. The authors provide a model that explicitly outlines the essential concepts and relationships to consider when evaluating, adjusting, selecting, and developing explainability approaches. We utilize this model to inform our analyses of stakeholders’ needs in the context of AI explainability. In particular, the work of these authors prompts us to begin by looking at the concrete needs of stakeholder groups, to ask under what contextual conditions these needs would be met, and how, in principle, different XAI methods can help in achieving this.

Once these relationships have been understood, however, tangible forms are needed in practice to unambiguously express the requirements. This includes defining the conditions under which the requirements should be considered fulfilled and determining how to check whether conditions were met. The findings of [35] that investigates the elicitation, specification, and verification of explainability as a Non-Functional Requirement (NFR) offer a good starting point for doing so. They start by observing that a systematic and overarching approach to ensure explainability by design remains elusive while there is a growing demand for explainability. The author’s work provides a conceptual analysis that unifies different notions of explainability and corresponding demands, using a hypothetical hiring scenario as an example. In our work, we build upon the rough-and-ready explainability requirements outlined in this research as a starting point for our use case, aiming to refine further and concretize the approach through practical application.

3.1 Systematic Analysis

In order to derive empirically testable hypotheses about which requirements result from the use of our AI system systematically, we draw on the model from [36] (see also Fig. 2). To do this, we analyze the situations in four steps. The first step is to determine the relevant stakeholder groups as crucial contextual factors.

We omit other factors beyond stakeholder identity because we could not identify any further critical factors (e.g., no time pressure).

We then attempt to identify the specific practical needs of these stakeholder groups, focusing on supervisors and their needs that may explain the lack adoption of AI recommendations. Thirdly, we consider the types of explanatory information that address the identified needs. Finally, the XAI methods that can provide this information are presented.

Stakeholders. To ensure the successful implementation and operation of an AI system within an assembly line setting, understanding and addressing the needs of the following three key stakeholder groups is essential:

  • Supervisors. Supervisors are responsible for deciding which car bodies leave the line buffer. Importantly, it is they who decide whether to follow the AI’s recommendation.

  • Assembly Line Workers. The AI-supported decisions directly affect workers on the assembly line. They may encounter unexpected or potentially frustrating configurations.

  • Management. Managers must comprehend the underlying reasons in case of incidents and, most importantly, production halts.

All of these roles significantly influence the system’s overall performance and are, in a sense, affected by the deployment of our AI system. Arguably, one can foster a more efficient, productive, and transparent AI-driven environment by catering to their requirements and addressing their concerns.

Needs. Upon closer examination, we find relatively similar needs for each class of stakeholder, which can thus be subsumed under three more general categories: needs for (adequate) trustworthiness assessment; needs for responsibility; and needs for improvement. Table 1 presents a general overview of the individual needs of supervisors, workers, and management we identified and show which categories they fall into. Below, we list the three relevant needs of supervisors whose frustrations could plausibly explain the lack of adoption of our system:

  • Trustworthiness Assessment. A lack of trust in the AI system and its results could explain why supervisors often do not follow pay to the generated recommendations. Drawing from the literature on organizational trust [45] and trust in automation [37], there could be several reasons for such a lack of trust. For example, the system could perform poorly, revealing its insufficient capability and thus disqualifying it from being a trustworthy system. Since our AI-driven system has been shown to perform better than the previous system, this is not a plausible explanation. Alternatively, the user’s lack of trust might be a case of decalibrated trust [37], i.e., the incorrect assessment of a trustworthy system as non-trustworthy. Again, this could be explained by several factors [55], and the most likely one in our case is inadequate access to relevant cues and information by the supervisor. In other words, our system needs to allow supervisors to appropriately develop trust in the system by providing more information. Therefore it is rational, especially in the absence of experience, to not trust the system when its recommendations do not match one’s expectations.

  • Responsibility. The human supervisor should not be a mere button pusher but a responsible decision-maker. This is not only the factory management’s expectation but is also likely to correspond to the supervisor’s self-image. However, our system could give the human in the loop an insufficient sense of self-efficacy, and it might indeed put them in a position where they cannot – and should not – take responsibility for their decisions. By this, we mean that if supervisors deviate from the traditional rules and their gut feeling in favor of AI recommendations, they may fear not being able to justify their decision. Especially in case of incidents like production stops, pointing to the computer and telling the management the computer said so is undoubtedly unsatisfactory. No one wants to find themselves in such a situation, so they might decide to follow the old rules rather than the AI recommendations. If so, the situation of the supervisors would be typical instances of only apparent responsibility in case of de facto responsibility gaps [6, 44].

  • Improvement. Supervisors should regularly review the AI system’s performance and contribute to its ongoing improvement. This involves assessing accuracy and reliability, analyzing data, comparing AI-generated recommendations with their traditional approaches, and identifying biases or errors. By continually staying informed about the AI system’s strengths and weaknesses, supervisors can enhance their own skills, enabling them to make more informed decisions regarding the utilization of AI assistance in the future. This adaptation allows them to respond effectively to changing circumstances within the production context.

Table 1. Stakeholder’s needs. In the paragraph on the stakeholder’s needs, we explain the supervisor’s needs, those in the top row, in more detail. Their non-fulfillment could explain the need for more adoption of AI recommendations.

Even if we were to restrict our investigation to these three needs of the supervisors let alone considering all nine defined needs, (cf. Table 1) this would go beyond the scope of this work. As we believe that the most plausible barrier to using AI recommendations is that supervisors do not feel they can arrive at well-informed decisions themselves, we consider the need of responsibility to be the best explanation for the observed lack of adoption. Thus, we restrict the further analyses to what we take to be the heart of that need.

In order to complete the analysis in terms of [36], we next need to ask ourselves what explanatory information is purposeful to meet the needs of the respective stakeholders to be addressed. From this, we can make a first educated guess concerning suitable XAI methods.

Explanatory Information. The model of [36] requires us to ask what understanding is required on the part of the stakeholders in order to fulfill their individual needs. Only then can one ask what information is needed for this so that the corresponding understanding can be obtained and actually will be obtained in practice. However, of course, we cannot base such analysis on fully developed, sophisticated theories of understanding from Philosophy or Psychology. However, this is not necessary for practical application, mainly because we cannot directly observe or measure understanding as a latent structure anyway. Instead, we ask ourselves how the respective understanding might show up in the behavior or capacity of a supervisor. What is it that the supervisor cannot do in the present case but should be able to do so that one can rightly say that the individual need is fulfilled?

In order to understand what is missing to enable supervisors to do this, it is helpful to take the perspective of the supervisors. Supervisors have so far made decisions based on specific rules of thumb and their gut feeling, which has developed from experience and is difficult to verbalize, explicate, and explain. This form of intuitive knowledge need not be inferior to explicable knowledge (a fact also known as Polanyi’s paradox). Now, AI-based decision recommendations entered the picture, which have quite similar characteristics. Remember: Our AI system’s recommendations result from a self-reinforcing learning process in which both said rules of thumb and the result of simulations (which might be understood as formalized work experience) go hand in hand. Similar to the decisions of humans, however, the system’s recommendations cannot be easily cast into explanations. Only when these two different forms of implicit knowledge conflict does the problem arise. The following case distinction illustrates this:

  • Either the recommendations are consistent with the rules of thumb and could be explained by them. However, they should be already similar to what the supervisor would decide. So this case is not interesting as it cannot be the basis for the observed lack of adoption.

  • Alternatively, the AI’s recommendations apparently conflict with the rules of thumb. In this case, the recommendation either conforms to what the supervisor would do or they do not:

    • In the first case, there is no disparity between the recommendation and the decision. Again, this case is of no interest as there is no disagreement between humans and machines.

    • However, if the system’s recommendation does not correspond to what the supervisor would do, a relevant case is found.

Why should supervisors follow the recommendations of the system in that last case? If they follow their feeling, they can take responsibility as they can rationalize and verbalize their decision or at least partially justify it. However, if they follow the system’s recommendation, the most they can do is point to the AI. In this case, they must fear others questioning whether they are essential to the decision process. In short, they would be mere “vicarious agents” of the AI, i.e., mere button-pushers.

This consideration leads to the fundamental question that the supervisor should be able to answer: Why did someone choose car A and not car B (or C or D or ...) as the next car for the conveyor? To answer this question, they must know why the AI suggested one car body and not another. Suppose the supervisor followed the recommendation of the AI. The supervisor must then be able to answer in the following form: “I would have chosen body X rather than A. However, the AI suggested choosing body A. At first, I was surprised, but in fact I learned from the system that E would have obtained otherwise.” Whereas E should refer to a valid reason that supports choosing A instead of X. For example, E could be that choosing X would have made a production stoppage more likely. Alternatively, that A has been waiting in the buffer for a very long time and is blocking the buffer line where other car bodies are waiting for that should be able to be selected later for effective operation. From this, we conclude that at the core of the supervisor’s need for responsibility lies the following expectation:

  • Reasoned Decision-Making: Being able to weigh the pros and cons of AI recommendations against the pros and cons of one’s own judgments.

Thus, the required explanatory information should be contrastive. Contrastive explanations focus on comparing the outcome of interest with a reference outcome, highlighting the differences between the two. This approach seeks to answer questions like, “Why did the model choose option A over outcome B?” Contrastive explanations emphasize the specific features or factors that led the model to prefer one outcome over another. By providing this comparative perspective, contrastive explanations offer users a better understanding of the decision-making process and the relative importance of different factors.

Contrastive explanations are not necessarily counterfactual explanations. Instead, counterfactual explanations typically explain what minimal input feature variations would have led to other outputs. These explanations answer questions like, “What would need to change in the input for the model to produce a different outcome?” Counterfactual explanations present hypothetical situations where the model’s decision would have differed if certain conditions or input features were altered. It is generally assumed that such dependence shows what part of the input is “causally most relevant” for the specific output. This approach helps grasp the sensitivity of the model’s decisions to specific factors and understand the impact of those factors on the decision-making process.

However, counterfactual explanations can also be understood on the side of outputs rather than inputs: As explanations that function about what would otherwise have been the case. Of course, this is precisely the kind of explanation we need in this case: we need to show the supervisor the extent to which his favored choice would lead to different consequences than following the system’s recommendations – and why those consequences would be better than the consequences of the supervisors favored choice. Supervisors can either accept this explanation, which is necessarily based on simulation results based on two different runs in the factory’s digital twin, or reject it based on their experience and factual information. (Note, that in practice the information that is available to a human supervisor may well exceed that available to the AI system. For example, the supervisor may recognize that the workers at a particular workstation are performing better than usual or that a workstation is understaffed today and therefore considered too efficient in the simulation. This kind of estimation is precisely what the supervisor is supposed to be there for as a human in the loop.)

In summary, contrastive explanations in XAI emphasize the differences between the two outcomes. At the same time, counterfactual explanations explore alternative scenarios by modifying input features to explain AI models’ decision-making processes better.

In our use case, a contrastive and counterfactual explanation, contrasting the consequences of following the option the supervisor would choose with the consequences of following the AI recommendations, seems particularly useful for the supervisor. Why choosing the body from one line is preferable to drawing a body from another can depend on highly non-trivial dependencies, for instance, about possible future choices and further uncertainties. This might make generating good explanations computationally expensive and their presentation non-trivial. However, supervisors need to be in a position to understand such dependencies, at least approximately, if they are meant to base their decisions on the pros and cons of the different alternatives.

This leaves the question of what kind of XAI approaches we need.

XAI Methods. While the specific choice and design of a concrete XAI approach are beyond the scope of this paper, the above considerations do make clear that post-hoc approaches to local explanation should be considered:

Ante-Hoc vs. Post-Hoc. Ante-hoc approaches focus on creating transparent and explainable systems that do not require additional procedures for understanding their inner workings or outputs. Models like decision trees, rule-based models, and linear approximations are inherently explainable, but this may only be useful for stakeholders with specific expertise. Using inherently explainable models often comes with a loss of accuracy.

Post-hoc approaches focus on extracting explanatory information from a system’s underlying model, which is usually a black-box, i.e., an inherently opaque model.

Since, in our case, the AI system, a deep neural network, already exists and works very well, a post-hoc approach is demanded. Two reasons in particular suggest that there is no need to switch to ante-hoc explainable models. Firstly, the considerations above suggest that the kinds of explanations to be generated are likely to be simulation-based. Secondly, said considerations suggest that the specific decision criteria of the AI, if such a thing is encoded at all in a separable way in the internal structure of the deep neural network, are likely to be less relevant. Even if such criteria are not present at all or are present but cannot be extracted (which in the end is not even a proven impossibility), this does not necessarily speak against principally opaque models plus post-hoc XAI techniques.

Local vs. Global. In the field of XAI, local and global explanation approaches serve different purposes when it comes to understanding AI models. Local explanations provide insights into specific outputs. They should make it easier to comprehend why the model arrived at that specific conclusion.

Global explanations, on the other hand, provide an overall understanding of the AI model’s behavior and decision-making process. They seek to convey a general understanding of how the model operates, including its patterns, relationships, and rules. Global explanations are valuable when stakeholders need a holistic view of the model or aim to identify potential biases, trends, or systemic issues in the model’s functioning. In summary, local explanations clarify individual decisions made by an AI model, while global explanations offer a broader view of the model’s behavior and decision-making patterns.

Considering that our primary objective is to enable the supervisor to understand individual system decisions and justify them (to themselves, the management, or the workers), we focus our attention on local explanations in this context.Footnote 1

3.2 Derivation of Requirements and Conditions for Success

All the findings listed above as results of our analysis are systematic. However, they need a more formal structure to be explicitly fed into further development or used to design business processes. [35] proposed the following general structure for explainability requirements that are meant to serve such purposes:

Definition 1 (Explainability Requirement)

[Explainability Requirement] A system S must be explainable for target group G in context C with respect to aspect Y of explanandum X.

This definition emphasizes two essential points: First, a system must be understandable to a particular target group (the stakeholders) in a particular context (that, remember, is assumed to be given in our case by the stakeholder’s role alone); and second, an explanation always refers to a particular aspect of what is to be explained (i.e., the explanandum). In the present cases, the explanandum is not the AI system itself but its outputs, i.e., we are interested in local explanations. The aspect of the recommendation we are after is what makes it better than what the supervisor would otherwise choose, e.g., in the absence of the recommendation. Hence, we can derive the following specific requirement:

Definition 2 (Explainability Requirement: Supervisor’s Responsibility)

[Explainability Requirement: Supervisor’s Responsibility] The AI-based recommendation system must be explainable for supervisors with respect to what makes the system’s recommendation better than the options the supervisor may favor initially.

Finally, we need a success criterion, i.e., we need to specify the circumstances under which a certain kind of explanatory information fulfills the requirement. As previously mentioned, ensuring that the explanation is appropriate for the stakeholder is crucial. For this, we again make use of [35], which, quite similar to the model of [36], emphasizes both the stakeholder-relativity of explanations and the importance of understanding concerning measuring success:

Definition 3 (Explainable For)

E is an explanation of explanandum X with respect to aspect Y for target group G, in context C, if and only if the processing of E in context C by any representative R of G makes R understand X with respect to Y.

So for our case-specific purposes, we get the following instance:

Definition 4 (Explainable For: Supervisor’s Responsibility)

E is an explanation of the AI-based system’s recommendations with respect to what makes the system’s recommendation better than the options the supervisor may favor initially for the target group of supervisors if and only if the processing of E by any specific supervisor makes them understand the AI-based system’s recommendation with respect to this contrastive regard.

As before, we do not need a detailed, theoretical understanding of the concept of understanding as [35] aims at a practical, capability-oriented approach, precisely as argued for in the previous section. Ultimately, such an approach should allow the carrying out of concrete user studies to determine whether the requirement is fulfilled:

Analyzing explanations in terms of understanding [...] enables leveraging results from psychology and the cognitive sciences to assess whether something is an explanation and how people react to different explanations. In particular, tying explainability to an understanding eventually enables verification through studies conducted with the relevant stakeholders [...] Overall, the idea is to enable the examination of explainability by measuring understanding, e.g., in psychological studies of whether the processing of specific explanations makes stakeholders understand the explanandum concerning the relevant aspect in relevant contexts. [35, pp. 365]

Thus, structured interviews and observational studies with supervisors should be conducted to assess the suitability of some approaches in the direction sketched in the last subsection. For instance, one should observe how the supervisors decide and ask them to justify their decisions for a given set of specific example choices, together with explanations.

Suppose these interviews reveal that the supervisors’ choices are highly inaccurate and inefficient or that their answers are insufficiently based on the given explanations. In that case, the specific explanation approach obviously does not meet the requirement and needs to be revised.

4 Conclusion

This paper examined a real-world case study highlighting the importance of explainability in AI-driven decision-making processes. For this, we investigated the actual application of AI in a smart manufacturing context in which insufficient consideration was given to the new human-machine interaction arising from the AI deployment. In particular, we focused on the observed lack of adoption of the AI-generated recommendations by human supervisors.

To understand how to remedy this, we used the model of [36] and identified a set of stakeholders and their respective needs induced by AI deployment. We explained to what extent these needs are frustrated by a lack of explainability of the AI recommendations and how this frustration arguably explains the observed lack of adoption. From this, we inferred plausible types of explanatory information and identified potentially requisite types of explanatory methods. Based on this, we derived concrete explainability requirements based on prior work from [35] and described how to test their fulfillment in studies. However, carrying out the outlined program of work is beyond the scope of this paper.

In the following, we briefly point out how the knowledge and insights collected throughout this case study can be incorporated into the future development of AI-based business processes so that corresponding emerging explainability requirements can be anticipated rather than identified after deployment when problems arise. We then provide a brief outlook on future work that could build on our case study.

4.1 XAI in Business Process Management and Business Process Management Notation

Our analysis has shown that it is necessary to incorporate XAI techniques to ensure good trust, enable responsible decisions, and allow for continuous improvement and that one can systematically identify corresponding requirements “from the armchair”. By anticipating the needs of stakeholders to understand AI-generated results at the design time of business process modeling in the form of explainability requirements as described in our work, the emergence of new hurdles, such as the lack of adoption discussed here, can be prevented or at least minimized from the outset. In light of these findings, considering XAI in the early stages of business process modeling is essential for organizations to realize the full potential of AI systems and address potential challenges. We are not alone in this view, even though the integration of XAI into business process modeling is still in its infancy [18, 31].

Integrating XAI early on in the modeling process promises several advantages:

  • Adequate Trust and Acceptance: Understanding the rationale behind AI-driven decisions enables and fosters adequate trust (in the sense of [55]) and acceptance among stakeholders towards the AI system, including different groups of employees, managers, and regulators. Transparent decision-making processes can lead to better collaboration and more widespread adoption of AI systems within organizations.

  • Improved Decision-Making Quality: Incorporating XAI techniques during the design phase allows for more informed decision-making, as users can evaluate the AI system’s reasoning, assumptions, and potential biases. This ensures that AI-generated outcomes align with organizational goals and possibly also moral standards.

  • Facilitating Compliance and Auditing: Ensuring transparency and explainability in AI systems from the outset aids in meeting regulatory requirements and addressing potential legal or ethical concerns. Organizations can more effectively demonstrate compliance with data protection and AI ethics regulations by explaining their AI-driven decisions.

  • Simplified Debugging and Optimization: The early integration of XAI enables easier identification of issues or shortcomings in AI systems, allowing for more efficient debugging and optimization processes. This, in turn, results in better-performing AI models that are tailored to meet specific business objectives.

  • Streamlining Knowledge Transfer: XAI facilitates knowledge transfer and employee training, offering a deeper understanding of the AI system’s decision-making processes. This empowers employees to work effectively with AI systems and ensures the organization’s smooth integration of AI technologies.

In conclusion, considering XAI from the initial stages of business process modeling is crucial for organizations seeking to harness the full potential of AI systems while addressing potential challenges. By integrating XAI early on, organizations can enhance trust, improve decision-making quality, and facilitate compliance, leading to more effective and ethically sound AI-driven business processes.

4.2 Future Work

In our future work, we primarily envision two areas of focus:

Firstly, conducting empirical studies involving decision-makers and stakeholders from our real-life case example is imperative. This will allow us to test our current theoretical considerations in practice and gain valuable insights into the practical implications of our findings.

Secondly, explainability can be explicitly integrated into the design of AI-driven processes. How and in what form can explainability requirements already be considered in the design of business processes? This may also require extensions to the corresponding modeling languages like BPMN.

Thirdly, our real-world case study shows that having a solid framework for building studies would help test the fulfillment of the derived explainability requirements. How could appropriate studies be systematically derived directly from the requirements? For instance, which kinds and types of abilities typically correspond to what needs of what sort of stakeholder? How do they relate to the various XAI methods and all sorts of explanatory information? Moreover, how could they be executed without employing entire teams of organizational psychologists? No such framework or guiding principles exist – although, of course, the need for such studies has long been recognized and their general structure has been described [17, 28, 54], a number of specific studies have been conducted (for a relatively recent overview, see [53]), and both contextual (and stakeholder) relevance and the need for interdisciplinarity have been emphasized [1, 39, 46]. While we hope to have underscored all of this in the context of industrial and business processes in this work, we intend future work to tie the development of such studies more closely to explicit, non-functional requirements such as those outlined here.