1 Introduction

Data-driven AI for Computer Vision can achieve high levels of predictive accuracy, yet the rationale behind these predictions is often opaque. This paper proposes a novel explainable AI (XAI) method called CLEAR Image that seeks to reveal the causal structure implicitly modelled by an AI system, where the causes are an image’s segments and the effect is the AI system’s classification probability. The explanations are for single predictions and describe the local input–output behaviour of the classifier. CLEAR Image is based on the philosopher James Woodward’s seminal analysis of causal explanation (Woodward, 2003), which develops Judea Pearl’s manipulationist account of causation (Pearl, 2000). Together they constitute the dominant accounts of explanation in the philosophy of science. We argue that a successful explanation for an AI system should be contrastive, counterfactual and measurable.

According to Woodward, to explain an event E is “to provide information about the factors on which it depends and exhibit how it depends on those factors”. This requires a causal equation to describe the causal structure responsible for generating the event. The causal equation must support a set of counterfactuals; a counterfactual specifies a possible world where, contrary to the facts, a desired outcome occurs. The counterfactuals serve to illustrate the causal structure and to answer a set of ‘what-if-things-had-been-different’ questions. In XAI, counterfactuals usually state minimal changes needed to achieve a desired alternative outcome.

A contrastive explanation answers the question ‘Why E rather than F?’. In the philosophy literature, F is referred to as E’s foil. F comes from a contrast class of events that are alternatives to E, but which did not happen (Van Fraassen, 1980). The reason why explanations should be contrastive is captured by Hilton: “The key insight is to recognise that one does not explain events per se, but that one explains why the puzzling event occurred in the target cases but not in some counterfactual contrast case” (Hilton, 1990). When a person asks for an explanation, the relevant contrast class is often not explicitly conveyed but instead is implicit in the explanatory question. For example, when a priest asked Willie Sutton why he robbed banks, Sutton’s reply ‘Well that’s where the money is’ was not a satisfactory explanation because the priest’s implicit contrast was ‘not robbing’ but Sutton took it to be ‘robbing something else’ (Garfinkel, 1982). An explanation identifies the salient causes that led to E occurring rather than F.

For Woodward, all causal claims are counterfactual and contrastive: ‘to causally explain an outcome is always to explain why it, rather than some alternative, occurred’. Woodward’s analysis is consistent with Miller’s review of over 250 papers on explanation from philosophy, psychology and cognitive science (Miller, 2018). Miller states that perhaps his most important finding is that “Explanations are contrastive — they are sought in response to particular counterfactual cases... This has important social and computational consequences for explainable AI.”

Woodward’s theory of explanation stands in opposition to the multiple XAI methods that claim to provide counterfactual explanations (Verma et al., 2020), but which only provide statements of single or multiple counterfactuals. As this paper will illustrate, counterfactuals without a supporting causal equation will only provide incomplete explanations. Woodward’s theory also stands in opposition to XAI methods such as LIME that only provide an equation, but do not provide counterfactuals.

CLEAR Image identifies cases of ‘causal overdetermination’. The causal overdetermination of an event occurs when two or more sufficient causes of that event occur. An example from the philosophy literature is of two vandals who each throw a rock that simultaneously shatters a window, with each rock being sufficient to shatter the window. The shattering of the window is causally overdetermined (Schaffer, 2003). This causal structure may well be ubiquitous in learning systems. For example, there may be multiple patches in a medical image, any of which being sufficient by itself to cause a classification probability close to one. To the best of our knowledge, CLEAR Image is the first XAI method capable of identifying causal overdetermination.

CLEAR Image explains an image’s classification probability by comparing the image with a corresponding contrast image. In this work, the contrast image is a synthetic image created by a generative adversarial network (GAN) (Goodfellow et al., 2014). The contrast between the two images can be reflected in the pixel differences and a difference mask is created by subtracting the original image from its corresponding GAN-generated image. For example, difference masks have been previously used to visualise the difference in synthetic image generation for face forgery detection (Cao et al., 2022) and for anomaly detection in medical images (Wolleb et al., 2020). These pixel differences are good segments to start a contrastive explanation with. However, as we will illustrate, segments identified from difference masks alone can vary significantly in their relevance to a classification; furthermore, other segments critical to the classification can often be absent from the mask. Therefore, CLEAR Image uses a novel segmentation method that combines information from the difference mask, the original image and the classifier’s behaviour. After completing its segmentation, CLEAR Image identifies counterfactuals and then follows a process of perturbation, whereby segments of the original image are changed, and the change in outcome is observed to produce a regression equation. The regression equation is used to determine the contribution that each segment makes to the classification probability. The regression equation is a causal equation with each independent variable referring to whether a particular segment is a direct cause of the classification probability. As will be shown, the explanations provided by leading XAI methods LIME and Grad-CAM may not be reliable. CLEAR Image, therefore, measures the fidelity of its explanations, where fidelity refers to how closely an XAI method is able to mimic a classifier’s behaviour. In summary, a CLEAR Image explanation specifies: segmentation importance scores, counterfactuals, a regression equation, segments leading to overdetermination and fidelity errors.

By providing both a statement of counterfactuals and a supporting causal equation, CLEAR Image seeks to satisfy Woodward’s specification for an explanation.

CLEAR Image was evaluated in two case studies, both involving overdetermination. The first uses a multifaceted synthetic dataset, and the second uses chest X-rays. CLEAR Image outperformed XAI methods such as LIME and Grad-CAM by an average of 31% on the synthetic dataset and 27% on the X-ray dataset (see Sect. 4.4) based on a pointing game metric defined in this paper for the case of multiple targets.

The contribution of this paper is four-fold. We introduce an XAI method that:

  • Generates contrastive, counterfactual and measurable explanations outperforming established XAI methods in a challenging image domain;

  • Uses a GAN-generated contrast image determining a causal equation, segment importance scores and counterfactuals.

  • Offers novel segmentation and pointing game algorithms for the evaluation of image explanations.

  • Is capable of identifying causal overdetermination, i.e. the multiple sufficient causes for an image classification.

CLEAR Image is a substantial development of an earlier XAI method, (Counterfactual Local Explanations viA Regression), which only applies to tabular data (White and Garcez, 2020). New functionalities include: (i) the segmentation algorithm, (ii) generating perturbed images by infilling from the corresponding GAN image, (iii) a novel pointing game suitable for images with multiple targets, (iv) identification of sufficient causes and overdetermination, (v) measurement of fidelity errors for counterfactuals involving categorical features.

The remainder of the paper is organised as follows: Sect. 2 provides a summary of related work. Section 3 introduces the CLEAR Image method and algorithms. Section 4 details the experimental setup and discusses the results. Section 5 concludes the paper and indicates directions for future work.

2 Related work

This paper adopts the following notation: Let m be a machine learning system mapping each input instance x to a class label l with probability y. Each input instance x is an image that can be partitioned into S segments (regions) \(\{s_1, \dots ,s_n\}\). We use \(x'\) to denote a GAN-generated image derived from x such that \(m(x')=l\) with probability \(y'\).

The XAI methods most relevant to this paper can be broadly grouped into four types:

  1. (i)

    Counterfactual methods Wachter et al. (2017) first proposed using counterfactuals as explanations of single machine learning predictions. Many XAI methods have attempted to generate ‘optimal’ counterfactuals; for example, Karimi et al. (2020) review sixty counterfactual methods. The algorithms differ in their constraints and the attributes referenced in their loss functions (Verma et al., 2020). Desiderata often include that a counterfactual is: (1) actionable – e.g. actions do not get recommended if they are physically infeasible, such as reducing a person’s age, (2) near to the original observation - common measures include Manhattan distance, L1 norm and L2 norm, (3) sparse – only changing the values of a small number of features, (4) plausible - e.g. the counterfactual must correspond to a high-density part of the training data, (5) efficient to compute. Karimi et al. (2021) argue that these methods are likely to identify counterfactuals that are either suboptimal or infeasible in terms of their actionability. This is because they do not take into account the causal structure that determines the consequences of the person’s actions. The underlying problem is that unless all of the person’s features are causally independent of each other, then when the person acts to change the value of one feature, other downstream dependents may also change. In Sect. 5 we will explain why this criticism does not apply to CLEAR Image. In this paper, we provide a different criticism of counterfactual methods: that they fail to provide satisfactory explanations because they do not provide a causal equation describing the local behaviour of the classifier they are meant to explain. Without this, they cannot identify: the relative importance of different features, how the features are taken to interact with each other, or the functional forms that the classifier is, in effect, applying to each feature. They will also fail to identify cases of overdetermination.

  2. (ii)

    Gradient-based methods These provide saliency maps by backpropagating an error signal from a neural network’s output to either the input image or an intermediate layer. Simonyan et al. (2014) use the derivative of a class score for the image to assign an importance score to each pixel. Kumar et al. (2017)’s CLass-Enhanced Attention Response uses backpropagation to visualise the most dominant classes; this should not be confused with our method. A second approach modifies the backpropagation algorithm to produce sharper saliency maps, e.g. by suppressing the negative flow of gradients. Prominent examples of this approach (Springenberg et al., 2014; Zeiler and Fergus, 2014) have been found to be invariant to network re-parameterisation or the class predicted (Adebayo et al., 2018; Nie et al., 2018). A third approach (Selvaraju et al., 2017; Chattopadhay et al., 2018) uses the product of gradients and activations starting from a late layer. In Grad-CAM (Selvaraju et al., 2017), the product is clamped to only highlight positive influences on class scores.

  3. (iii)

    Perturbation based methods Methods such as Occlusion (Zhou et al., 2016), Extremal Perturbation (Fong et al., 2019), FIDO (Chang et al., 2018b), LIME (Ribeiro et al., 2016) and Kernel SHAP (Lundberg and Lee, 2017) use perturbation to evaluate which segments of an image x are most responsible for x’s classification probability y. The underlying idea is that the contribution that a segment \(s_i\) makes to y can be determined by substituting it with an uninformative segment \(s_i'\), where \(s_i'\) may be either grey, black or blurred (Zhou et al., 2016; Fong et al., 2019; Ribeiro et al., 2016) or in-painted without regard to any contrast class (Chang et al., 2018b). LIME and Kernel SHAP generate a dataset of perturbed images, which feeds into a regression model, which then calculates segment importance scores (LIME) or Shapley Values (Kernel SHAP). Extremal Perturbation uses gradient descent to determine an optimal perturbed version of an image that, for a fixed area, has the maximal effect on a network’s output whilst guaranteeing that the selected segments are smooth. FIDO uses a variational Bernoulli drop to find a minimal set of segments that would change an image’s class. In contrast to LIME, Kernel SHAP and Extremal Perturbation, FIDO uses a GAN to in-paint segments with ‘plausible alternative values’; however, these values are not generated to belong to a chosen contrast class. Furthermore, segment importance scores are not produced.

    There are three key problems with using perturbed images to explain a classification:

    1. 1.

      A satisfactory explanation must be contrastive; it must answer ‘Why E rather than F?’ None of the above methods does this. Their contrasts are instead images of uninformative segments.

    2. 2.

      The substitution may fail to identify the contribution that \(s_i\) makes to y. For example, replacing \(s_i\) with black pixels can take the entire image beyond the classifier’s training distribution. By contrast, blurring or uninformative in-painting might result in \(s_i'\) being too similar to \(s_i\) resulting in the contribution of \(s_i\) being underestimated.

    3. 3.

      A segmentation needs to be relevant to its explanatory question. Current XAI perturbation approaches produce radically different segmentations. FIDO and Extremal Perturbation identify ‘optimal’ segments that, when substituted by an uninformative segment, maximally affect the classification probability; by contrast, LIME uses a texture/intensity/colour algorithm (e.g. Quickshift (Vedaldi and Soatto, 2008)).

  4. (iv)

    Contrastive methods using GAN image synthesis Generative adversarial network (GAN) (Goodfellow et al., 2014) has been widely applied for synthetic image generation. Image-to-image translation GANs enable a conditional transformation of an input image to a specified target. For example, CycleGAN (Zhu et al., 2017) and StarGAN (Choi et al., 2018) translate images between different domain classes. StarGAN-V2 (Choi et al., 2020) improved the conditional image translation by incorporating a style vector instead resulting in a more scaleable and high-quality synthetic image generation across a variety of target conditions. Fixed-point GAN penalised any deviation of image for intra-domain translation with identity loss. DeScarGAN (Wolleb et al., 2020) incorporates this loss function in its own GAN architecture and has outperformed Fixed-point GAN in its case study for identifying and localising pathology from chest X-rays. The availability of synthetic images can alleviate the constraint of data scarcity typically found in specialised domains (e.g. medical imaging). Singh and Raza (2021), Osuala et al. (2022) has presented GAN’s applicability in the medical domain.

While the adversarial training needed by GAN is known to be challenging for (i) maintaining training stability, (ii) reaching convergence and (iii) avoiding mode collapse (Arora et al., 2022; Mescheder et al., 2018; Salimans et al., 2016; Osuala et al., 2022), many examples of properly trained GAN have been achieved (Osuala et al., 2022). Kazeminia et al. (2020) provided numerous examples of employing GAN in medical image analysis. Chang et al. (2018a) introduced the fill-in the dropout region (FIDO) methods, wherein generative methods were applied for in-filling. This method however requires the generative model to recreate the missing regions based on the remaining unmasked features. Shih et al. (2020) emphasised the improvement in contrastive comparison using a GAN-generated contrast image in over earlier work using a uniform-value reference or a blurred input image. They modified the training of StarGAN model (Choi et al., 2018) and demonstrated that their GAN-generated images allowed more appropriate identification of attributing features and minimise errors that can be induced from other non GAN-generated alternatives. In situations where data is scarce, it is anticipated that the benefits of GAN-based synthetic image generation would outweigh the time and effort required to attain proper training in a GAN.

In Sect. 5 we will explain how CLEAR Image builds on the strengths of the above XAI methods but also addresses key shortcomings.

3 The CLEAR Image method

CLEAR Image is a model-agnostic XAI method that explains the classification of an image made by any classifier (see Fig. 1). It requires both an image x and a contrast image \(x'\) generated by a GAN. CLEAR Image segments x into \(\{s_1, \dots ,s_n\} \in\) S and then applies the same segmentation to \(x'\) creating \(\{s_1', \dots ,s_n'\} \in S'\). It then determines the contributions that different subsets of S make to y by substituting with the corresponding segments of \(S'\). CLEAR Image is GAN agnostic, allowing the user to choose the GAN architecture most suitable to their project. A set of ‘image-counterfactuals’ \(\{c_1 \dots c_k\}\) is also identified. Figures 1, 2, 3, 4 and 5 provide a running example of the CLEAR Image pipeline, using the same X-ray taken from the CheXpert dataset.

Fig. 1
figure 1

The CLEAR Image pipeline. The GAN produces a contrast image. CLEAR Image explains the classification probability by comparing the input image with its contrast image. It produces a regression equation that measures segment scores, reports fidelity and identifies cases of overdetermination. In this example, class l is ’pleural effusion’ and its contrast class \(l'\) is ’healthy’. Using our Densenet model, the X-ray shown in this figure had a probability of belonging to l equal to 1, and its contrast image had a probability of belonging to l equal to 0

3.1 GAN-based image generation

To generate contrastive images, StarGAN-V2 (Choi et al., 2020) and DeScarGAN (Wolleb et al., 2020) have been found to be capable of generating the high-quality images needed to identify the segments of pixel differences. These GANS are therefore deployed as the network architectures for our two case studies, the first using CheXpert, and the second using a synthetic dataset respectively. These established GAN networks demonstrate how the generated contrastive images can aid the overall CLEAR Image pipeline in our cases where contrast images are not available. Default training hyperparameters were applied unless otherwise stated. Details of model training and hyperparameters can be found in Appendix B. The source image was used as input for the Style Encoder instead of a specific reference image for StarGAN-V2. This ensures the generated style mimics that of the input source images. StarGAN-V2 is also not locally constrained (i.e. the network will modify all pixels in an image related to the targeted class, which will include irrelevant spurious regions of the image). A post-generation lung space segmentation step using a pre-trained U-Net model (Ronneberger et al., 2015) was therefore implemented. The original diseased lung space was replaced with the generated image, with a Gaussian Blur process to fuse the edge effect (see Fig. 2). This confines the feature identification space used by CLEAR Image to the lung space. It is an advantage of the CLEAR Image pipeline that it is possible to use pre-processing to focus the explanation on the relevant parts of x.

Fig. 2
figure 2

The process of generating a contrast image. An original diseased image is first used to generate a healthy contrast image with a trained GAN model. In this example, StarGAN v2 is used as the architecture. The generated healthy lung airspace is then segmented using a U-Net segmentation model blended onto the original diseased image to produce the final image by applying Gaussian blur to minimise any edging effect around the segments

3.2 Generating contrastive counterfactual explanations

Definition 1

An image-counterfactual \(c_j\) from l to \(l'\) is an image resulting from a change in the values of one or more segments S of x to their corresponding values in \(S'\) such that class\((m(x)) = l\), class\((m(c_j)) = l'\) and \(l \ne l'\). The change is minimal in that if any of the changed segments had remained at its original value, then class(m(x)) = class\((m(c_j))\).

CLEAR Image uses a regression equation to quantify the contribution that the individual segments make to y. It then measures the fidelity of its regression by comparing the classification probability resulting from each \(c_j\) with an estimate obtained from the regression equation.

Definition 2

Counterfactual-regression fidelity error Let \(reg(c_j)\) denote the application of the CLEAR Image regression equation given image-counterfactual \(c_j\).

$$\begin{aligned} \text {Counterfactual-regression fidelity error} = |reg(c_j) - y_{c_j}|. \end{aligned}$$

The following steps generate an explanation of prediction y for image x:

  1. 1.

    GAN-Augmented segmentation algorithm. This algorithm is based on our findings (in Section 5.4) that the segments (\(S_h\)) determined by analysing high-intensity differences between an image x and its corresponding GAN-generated image \(x'\) will often miss regions of x that are important to explaining x’s classification. It is, therefore, necessary to supplement segments \(S_h\) with a second set of segments \(S_l\) confined to those regions of x corresponding to low-intensity differences between x and \(x'\). \(S_l\) is created based on similar textures/intensities/colours solely within x.

    The pseudocode for our algorithm is shown in Algorithm 1. First, high and low thresholds (\(T_h\) and \(T_l\)) are determined by comparing the differences between x and \(x'\) using multi-Otsu; alternatively, the thresholds can be user-specified. \(T_h\) is then used to generate a set of segments, \(S_h\). The supplementary segments \(S_l\), are determined by applying the low threshold, \(T_l\), to the low-intensity regions and then applying a sequence of connected component labelling, erosion and Felzenszwalb (Felzenszwalb and Huttenlocher, 2004). The combined set of segments, \(S_h\) and \(S_l\), is checked to see if any individual segment is an image-counterfactual. If none is found, an iterative process is applied to gradually increase the minimum segment size parameter. The final set of segments (S, S’) is subsequently created using the combined set (\(S_h\), \(S_l\)) as shown in Fig. 3.

  2. 2.

    Determine x’s image-counterfactuals. A dataset of perturbed images is created by selectively replacing segments of x with the corresponding segments of \(x'\) (see Fig. 4). A separate image is created for every combination in which either 1, 2, 3, or 4 segments are replaced. Each perturbed image is then passed through m to determine its classification probability. All image-counterfactuals involving changes in up to four segments are then identified. (The maximum number of perturbed segments in a counterfactual is a user parameter; the decision to set it to 4 in our experiments was made as we found counterfactuals involving 5+ segments to have little additional explanatory value.)

  3. 3.

    Perform a stepwise logistic regression. A tabular dataset is created by using a {0,1} representation of the segments in each perturbed image from step 2. Consider a perturbed image \(x_{per}\). This will be composed of a combination of segments \(s_i\) from the original image x and segments \(s'_i\) from the GAN contrast image \(x'\). In order to represent \(x_{per}\) in tabular form, each segment of \(x_{per}\) that is from x is represented as a 1 and each segment of \(x_{per}\) that is from \(x'\) is represented as a 0. For example, if \(x_{per}\) consisted solely of \(\{s'_1,s_2,s_3,s_4\}\), and had a classification probability from m equal to 0.75 of being ’pleural effusion’, then this would be represented in tabular form as \(\{0,1,1,1,0.75\}\). The table of representation vectors is the input to a weighted logistic regression in which those perturbed images that are image-counterfactuals are given a high weighting and act as soft constraints. The {0,1} representation of the segments are the independent variables and the classification probability is the dependent variable. Figures (5 and 6) provide examples of the resulting logistic equation and the calculation of classification probability.

  4. 4.

    Calculate segment importance scores. These are the regression coefficients for each segment from step 3.

  5. 5.

    Identify cases of causal overdetermination (see below).

  6. 6.

    Measure the fidelity of the regression by calculating fidelity errors (see Fig. 5) and goodness of fit statistics.

  7. 7.

    Iterate to the best explanation. In XAI there is often a trade-off between the interpretability of an explanation and its fidelity. For example, a regression equation that has two independent variables and no interaction terms is likely to be easier to interpret than a regression equation with more independent variables and several interaction terms. Because of its increased complexity, the latter regression equation might better mimic the local input–output behaviour of the AI system to be explained (i.e. it will have greater fidelity). CLEAR Image allows the user to adjust parameters such as (i) whether to include interaction terms (ii) the maximum number of independent variables in a regression. It then reports the fidelity of the resulting explanation. In this way, the user can iterate to the explanation that they judge provides the best trade-off between interpretability and fidelity.

figure a
Fig. 3
figure 3

The GAN-Augmented segmentation algorithm. There are three stages. First, segments are identified from the high-intensity differences between the original image x and its contrast image \(x'\) (a). Second, additional segments are identified from the regions of x corresponding to low-intensity differences between x and \(x'\) (b) Third, the segments from the two steps are combined (c)

Fig. 4
figure 4

Determining image-counterfactuals. In this example segments, \(s_4\) and \(s_{11}\) are evaluated both separately and in combination. Substituting \(s_{11}\) with its corresponding contrast segment \(s'_{11}\) creates a perturbed image (b) with the same classification probability as the original image (a). The same applies with segment \(s_4\) (c). However, substituting both segments \(s_4\) and \(s_{11}\) results in a perturbed image (d) which has a classification probability of 0.43. Given a decision boundary at the probability of 0.5, d would be classified as a ’healthy’ X-ray and would therefore be an image-counterfactual

Fig. 5
figure 5

Extracts from a CLEAR Image report. The report identifies that substituting both segments 4 and 11 with the corresponding segments from its contrast image flips the classification probability to ’healthy’ According to the logistic regression equation these substitutions would change the probability of the X-ray being classified as ’pleural effusion’ to 0.44. However, when these segments are actually substituted and passed through the classifier, the probability changes to 0.43, hence the fidelity error is 0.01. CLEAR Image also identifies that substituting segments 3 and 11 also creates an image-counterfactual. Note that unlike methods such as GradCAM, CLEAR Image is able to identify segments that have a negative impact on a classification probability

For CLEAR Image an explanation is a tuple \(< G;C; r; O, e>\), where G are segment importance scores, C are image-counterfactuals, r is a regression equation, O are the causes resulting in overdetermination, and e are fidelity errors. The regression equation is a causal equation with each independent variable (each referring to whether a particular segment is from x or \(x'\)) being a direct cause of the classification probability. Figure 5 shows an extract from a CLEAR report. Pseudocode summarising how CLEAR Image generates an explanations is provided in Algorithm 2.

figure b

The causal overdetermination of an effect occurs when multiple sufficient causes of that effect occur. By default, CLEAR Image only reports sufficient causes which each consist of a single segment belonging to S. Substituting a sufficient cause for its corresponding member in \(S'\) guarantees the effect. In the philosophy of science, it is generally taken that for an effect to be classified as overdetermined, it should be narrowly defined, such that all the sufficient causes have the same, or very nearly the same impact (Paul, 2009). Hence for the case studies, the effect is defined as \(p(x \in diseased)> 0.99\), though the user may choose a different probability threshold. A sufficient cause changes a GAN-generated healthy image to a diseased image. This is in the opposite direction to CLEAR Image’s counterfactuals whose perturbed segments flip the classification to ’healthy’. Sufficient causes can be read off from CLEAR Image’s regression equation. Using the example in Fig. 6 with the logistic formula, a classification probability of > 0.99 requires \({{\varvec{w}}}^T{{\varvec{x}}} > 4.6\). The GAN healthy image corresponds to all the binary segment variables being equal to 0. Hence, \({{\varvec{w}}}^T{{\varvec{x}}}\) is equal to the intercept value of \(-\)4.9, giving a probability of \((1+exp^{4.9})^{-1} \approx 0.01\). If a segment \(s_i'\) is now replaced by \(s_i\), the corresponding binary variable changes to 1. Hence if segment 9 is infilled, then Seg09 = 1 and \({{\varvec{w}}}^T{{\varvec{x}}} =6.8 \ (i.e. 11.7 - 4.9)\). Similarly, infilling just segment 11 will make \({{\varvec{w}}}^T{{\varvec{x}}} > 4.6\). Either substitution is sufficient to guarantee \({{\varvec{w}}}^T{{\varvec{x}}} > 4.6\), irrespective of any other changes that could be made to the values of the other segment variables. Hence segments 9 and 11 are each a sufficient cause leading to overdetermination.

By contrast, XAI methods such as LIME and Kernel SHAP cannot identify cases of overdetermination. This is because they use simple linear regression instead of logistic regression. For example, suppose that an image has three segments: \(s_1, s_2, s_3\). In the regression dataset, each segment infilled from x has a value of 1 and each segment infilled from \(x'\) has a value of 0. LIME/Kernel SHAP’s regression equation will have the form: \(y = k_1s_1 +k_2s_2 + k_3s_3\). In the case of LIME, y is meant to be the classification probability and the regression coefficients (\(k_1, k_2, k_3\)) are the feature importance scores. Let us suppose there is overdetermination, with segments \(s_1\) and \(s_2\) each being a sufficient cause for x to be in a given class (e.g. ’pleural effusion’) with more than 0.99 probability. Hence, the regression equation should set y to a value greater than 0.99 not only when \(s_1 = s_2\) = 1, but also when either \(s_1 =1\) or \(s_2 =1\). This is clearly impossible with the above linear form (and the constraint that \(y \le 1\)). Mutatis mutanda, the same argument applies for Kernel SHAP.

Fig. 6
figure 6

Overdetermination. The report identifies segments 9 and 11 as each sufficient to have caused the original X-ray to be classified as ‘pleural effusion’ with a probability greater than 0.99. Hence this is a case of causal overdetermination. The corresponding GAN-generated image \(x'\) has a classification probability \(\approx 0\) for pleural effusion. If a perturbed image \(x_{per}\) was created by substituting all the segments of the original image x with the corresponding segments of \(x'\) except for segment 9, then \(x_{per}\) would still have a classification probability for pleural effusion greater than 0.99. The same would apply if only segment 11 was substituted

4 Experimental investigation

There are two case studies, the first using a synthetic dataset, the second analysing pleural effusion X-rays taken from the CheXpert dataset (Irvin et al., 2019). Transfer learning was used to train both a VGG-16 with batch normalisation and a DenseNet-121 classifier for each dataset. CLEAR Image was evaluated against Grad-CAM, Extremal Perturbations and LIME. The evaluation consisted of both a qualitative comparison of saliency maps and a comparison of pointing game and intersection over union (IoU) scores. CLEAR Image’s fidelity errors were also analysed (none of the other XAI methods measures fidelity).

4.1 Datasets

The synthetic dataset’s images share some key characteristics found in medical imaging including: (i) different combinations of features leading to the same classification and (ii) irrelevant features. All images (healthy and diseased) contain a set of concentric circles, a large and a small ellipse. An image is ‘diseased’ if either: (1) the small ellipse is thin-lined, and the large ellipse contains a square or (2) there is a triangle, and the large ellipse contains a square. The dataset is an adaptation of Wolleb et al. (2020).

CheXpert is a dataset of chest X-rays with automated pathological label extraction through radiology reports, consisting of 224,316 radiographs of 65,240 patients in total. Images were extracted just for the classes ‘pleural effusion’ and ‘no finding’. Mis-classified images and images significantly obstructed by supporting devices were manually filtered. A random frontal X-ray image per patient was collected. In total, a dataset of 2,440 images was used in this work for model training, validation and testing. Appendix A.2 details the data preparation process. A hospital doctor provided the ground truth annotation to the X-ray images with pleural effusion for our case study.

4.2 Evaluation metrics

This paper uses two complementary metrics to evaluate XAI methods. Both require annotated images identifying ‘target’ regions that should be critical to their classification. A pointing game produces the first metric, which measures how successfully a saliency map ‘hits’ an image’s targets. Previously pointing games have been designed for cases where (i) images have single targets (ii) the saliency maps have a maximum intensity point (Fong et al., 2019; Zhang et al., 2018). By contrast, this paper’s case studies have multiple targets, and the pixels within each CLEAR Image segment have the same value. We, therefore, formulated a novel pointing game. The pointing game partitions a ‘diseased’ image into 49 square segments, \(\hbox {P}=\{p_1\ldots p_{49}\}\) and identifies which squares contain each of the targets. The corresponding saliency map is also partitioned, and each square is allocated a score equal to the average intensity of that square’s pixels \(\hbox {Q} = \{q_1 \ldots q_{49}\}\). The pointing game then starts with the \(q_i\) of highest intensity and determines if the corresponding \(p_i\) contains a relevant feature. A successful match is a ‘hit’ and an unsuccessful match is a ‘miss’. This process continues until every target has at least one hit. The score for an image is the number of hits over the number of hits plus misses. Pseudocode is provided in Algorithm 3.

The second metric is IoU. It is assumed that each pixel in a saliency map is classified as ‘salient’ if it is above \(70^{th}\) percent of the maximum intensities in that map. IoU then measures the overlap between the ‘salient’ pixels \(pix^{salient}\) and the pixels belonging to the image’s targets \(pix^{target}\): \(IOU =(pix^{salient} \ \cap \ pix^{target})/ (pix^{salient} \cup \ pix^{target})\). The chosen percentile was an empirically identified threshold to maintain a relatively high IoU score by balancing high intersection with \(pix^{target}\) and small union of pixel regions with a large enough \(pix^{salient}\) (see Appendix A.1 for details).

Both metrics are useful but have counterexamples. For example, IoU would give too high a score to a saliency map that strongly overlapped with a large image target but completely missed several smaller targets that were also important to a classification. However, applied together, the two metrics provide a good indication of an XAI’s performance.

figure c

4.3 Experimental runs

CLEAR Image was run using logistic regression with the Akaike information criterion; full testing and parameter values can be found in Appendix B.3. The test datasets consisted of 95 annotated X-rays and 100 synthetic images. The average running time for CLEAR Image was 20 s per image for the synthetic dataset and 38 s per image for the CheXpert dataset, running on a Windows i7-8700 RTX 2070 PC. Default parameter values were used for the other XAI methods, except for the following beneficial changes: Extremal Perturbations was run with ‘fade to black’ perturbation type, and using areas {0.025,0.05,0.1,0.2} with the masks summed and a Gaussian filter applied. LIME was run using Quickshift segmentation with kernel sizes 4 and 20 for the CheXpert and synthetic datasets respectively.

4.4 Experimental results

CLEAR Image outperforms the other XAI methods on both datasets (Fig. 7a). Furthermore, its fidelity errors are low, indicating that the regression coefficients are accurate for the counterfactually important segments (Fig. 7b). Figure 7c illustrates some of the benefits of using the ‘Best Configuration’, which uses GAN-augmented segmentation and infills using \(x'\). This is compared with (i) segmenting with Felzenszwalb and infilling with \(x'\) (ii) segmenting with GAN-augmented but infilling with black patches (iii) segmenting with Felzenszwalb, infilling with black patches. Figure 8 illustrates how CLEAR Image’s use of GAN-augmented leads to a better explanation than just using a difference mask (e.g. CLEAR Image’s performance was similar for VGG-16 and DenseNet; therefore, only the DenseNet results are presented unless otherwise stated.

Fig. 7
figure 7

Evaluation metrics. a Compares the performances of different XAI methods with the DenseNet models. b Shows the fidelity errors for the DenseNet models. c Compares the performances of different configurations of CLEAR Image. The bars show 95% confidence intervals

Fig. 8
figure 8

GAN-Augmented Segmentation versus GAN difference mask. The difference mask identifies four segments but when CLEAR Image perturbs these, the two nearest to the top were found to be irrelevant. Of the other two segments, CLEAR Image identifies the segment it colors green to be far more important to the classification probability

Fig. 9
figure 9

Extracts from a CLEAR Image report for a synthetic image. The regression equation shows that Seg05 is a necessary but insufficient cause of the X-ray being diseased

CLEAR Image’s regression equation was able to capture the relatively complex causal structure that generated the synthetic dataset. Figure 9 shows an example. A square (SQ) is a necessary but insufficient cause for being diseased. An image is labelled as diseased if there is also either a triangle (TR) or the small ellipse is thin-lined (TE). When SQ, TR and TE are all present in a single image, there is a type of overdetermination in which TR and TE are each a sufficient cause relative to the ‘image with SQ already present’. As before, a diseased image corresponds to the binary segment variables equalling one and a classification probability of being diseased \(>0.99\) requires \({{\varvec{w}}}^T{{\varvec{x}}} > 4.6\). This can only be achieved by Seg 5 (corresponding to SQ) plus at least one of Seg 2 or Seg 7 (TE, TR) being set to 1 (i.e. being present). Figure 10 compares the saliency maps for synthetic data.

Fig. 10
figure 10

Comparison of XAI methods on synthetic data. The pointing game scores are shown in green and the IoU scores are in purple. The maps illustrate how CLEAR Image and LIME are able to tightly focus on salient regions of an image compared to broadbrush methods such as Grad-CAM and Extremal. The significance of a patch is indicated by its red intensity

For the CheXpert dataset, Fig. 11 illustrates how CLEAR Image allows for a greater appreciation of the pathology compared to ‘broad-brush’ methods such as Grad-CAM (please see Appendix A1 for further saliency maps). Nevertheless, the IoU scores highlight that the segmentation can be further improved. For CheXpert’s counterfactuals, only 5% of images did not have a counterfactual with four or fewer \(s'\) segments. Most images required several s segments to be infilled before its classification flipped to ‘healthy’, 17% required one segment, 30% with two segments, 24% with three segments and 24% with four segments. 17% of the X-rays’ were found to be causally overdetermined.

Fig. 11
figure 11

Comparison of XAI methods on X-ray. The pointing game scores are shown in green and the IoU scores are in purple. The significance of a patch is indicated by the intensity of red against the blue-outlined annotated ground truth

5 Discussion, conclusion and future work

With AI systems for images being increasingly adopted in society, understanding their implicit causal structures becomes paramount. Yet, the explanations provided by XAI methods cannot always be trusted, as the differences in the saliency maps of Fig. 11 exemplify. It is therefore important that XAI methods should measure their fidelity. By ‘knowing when it does not know’, it can alert the user when its explanations are unfaithful.

CLEAR Image recognises that a difference mask is only the starting point for an explanation. In the experiments reported in this paper, CLEAR Image uses a GAN-generated image both for infilling and as input to its own segmentation algorithm. As discussed below, other approaches are possible when the segmentation can be defined in advance with the use of prior knowledge as in the case of brain scans. This is under investigation.

We have shown that CLEAR Image can illuminate cases of causal overdetermination. Many other types of causal structures may also be ubiquitous in AI. For example, causal preemption and causal clustering are well documented within the philosophy of science (Baumgartner, 2009; Schaffer, 2004). The relevance of these to XAI creates an area of future work.

The examples in this paper help illustrate our claim that XAI counterfactual methods will often fail to provide satisfactory explanations of a classifier’s local input–output behaviour. This is because a satisfactory explanation requires both counterfactuals and a supporting causal equation. It is only because CLEAR Image produces a causal equation that it is able to identify (a) segment importance scores, including identifying segments with negative scores (Fig. 5), (b) segments that are necessary but insufficient causes (Fig. 9), (c) cases of overdetermination (Fig. 6). Providing only counterfactuals is insufficient; imagine another science, say physics, treating a statement of counterfactuals as being an explanation, rather than seeking to discover the governing equation. Perhaps the primary benefit of XAI counterfactual methods is in suggesting sets of actions. But as we noted in Sect. 2 and argued in Karimi et al. (2021), such methods may identify counterfactuals that are suboptimal or infeasible in terms of their actionability. This criticism does not apply to CLEAR Image because CLEAR Image’s purpose is to explain the local input–output behaviour of a classifier, and the role of its counterfactuals is (i) to illustrate the classifier’s causal structure (at the level of how much each segment can cause the classification probability to change) and (ii) answer contrastive questions. Hence, if the explanatory question is “why is this image classified as showing a zebra and not a horse?”, CLEAR Image might highlight the stripes on the animal as being a cause of the classification. Whilst this might be a satisfactory explanation of the classification, it is, of course, not actionable.

Methods such as LIME and Kernel SHAP bear some similarity to CLEAR Image as they also use a dataset of perturbed images to feed a regression. However, these methods do not use a GAN-generated image and do not report fidelity. Also, these methods assume that a classification probability is a simple linear addition of its causes. This is incorrect for cases of causal overdetermination and CLEAR Image, therefore, uses a sigmoid function.

A key limitation for CLEAR Image is its reliance on using a contrast image, for both infilling and for guiding segmentation. The contrast image needs to be aligned with the target image so that the perturbed images are correctly infilled. In this paper’s experiments, CLEAR Image uses a GAN-generated contrast image. But there is a data availability constraint for custom training of a GAN especially in specialized domains. Training stability and convergence, as well as mode collapse are also common concerns during GAN training. Nevertheless, it may still be possible to obtain contrast images through other means. For example in human neuroimaging, AI systems are often trained using registered and normalised MRI scans (Pölsterl et al., 2021). In such cases, a contrast image can be simply selected from images belonging to the required contrast class. In cases where a contrast image cannot be obtained then CLEAR can use the same infilling (black/blurred) and external segmentation methods used by LIME. CLEAR Image will then be expected to have similar fidelity as LIME but, critically, unlike LIME it will report its fidelity, so the user will know if the explanation corresponds to the underlying model.

Another possible limitation could be the understandability of CLEAR Image to non-technical users. A user study should now be carried out. These are time and resource consuming and need to be devised carefully by experts within specific application domains to produce sound and reliable results. Instead, we have focused on objective measures and evaluations of fidelity which in our view should precede any user study. Future work will also include adapting CLEAR Image to the multimodal neural networks now being used in human neuroimaging, where contrast images can be readily obtained without using a GAN. There are brain atlases for these registered images (e.g. https://atlas.brainnetome.org) which provide neurologically meaningful segments. Another area of work will be to extend our analysis of overdetermination to other types of causal structures.