Evaluating the Stability of Semantic Concept Representations in CNNs for Robust Explainability

Mikriukov, Georgii; Schwalbe, Gesina; Hellert, Christian; Bade, Korinna

doi:10.1007/978-3-031-44067-0_26

Part of the book series: Communications in Computer and Information Science ((CCIS,volume 1902))

Included in the following conference series:

World Conference on Explainable Artificial Intelligence

956 Accesses
3 Citations

Abstract

Analysis of how semantic concepts are represented within Convolutional Neural Networks (CNNs) is a widely used approach in Explainable Artificial Intelligence (XAI) for interpreting CNNs. A motivation is the need for transparency in safety-critical AI-based systems, as mandated in various domains like automated driving. However, to use the concept representations for safety-relevant purposes, like inspection or error retrieval, these must be of high quality and, in particular, stable. This paper focuses on two stability goals when working with concept representations in computer vision CNNs: stability of concept retrieval and of concept attribution. The guiding use-case is a post-hoc explainability framework for object detection (OD) CNNs, towards which existing concept analysis (CA) methods are successfully adapted. To address concept retrieval stability, we propose a novel metric that considers both concept separation and consistency, and is agnostic to layer and concept representation dimensionality. We then investigate impacts of concept abstraction level, number of concept training samples, CNN size, and concept representation dimensionality on stability. For concept attribution stability we explore the effect of gradient instability on gradient-based explainability methods. The results on various CNNs for classification and object detection yield the main findings that (1) the stability of concept retrieval can be enhanced through dimensionality reduction via data aggregation, and (2) in shallow layers where gradient instability is more pronounced, gradient smoothing techniques are advised. Finally, our approach provides valuable insights into selecting the appropriate layer and concept representation dimensionality, paving the way towards CA in safety-critical XAI applications.

You have full access to this open access chapter, Download conference paper PDF

Playing to distraction: towards a robust training of CNN classifiers through visual explanation techniques

Article 09 July 2021

Computer Vision Explainability for Object Detection in Safety Surveillance

Expressive Explanations of DNNs by Combining Concept Analysis with ILP

Keywords

1 Introduction

Advancements in deep learning in the last decade have led to the ubiquitous use of deep neural networks (DNNs), in particular CNNs, in computer vision (CV) applications like object detection. While they exhibit state-of-the-art performance in many fields, their decision-making logic stays opaque and unclear due to their black-box nature [4, 44]. This fact raises concerns about their safety and fairness, which are desirable in fields like automated driving or medicine. These demands are formalized in industrial standards or legal regulations. For example, the ISO26262 [1] automotive functional safety standard recommends manual inspectability, and the General Data Protection Regulation [13] as well as the upcoming European Union Artificial Intelligence Act [43] both demand algorithm transparency. The aforementioned concerns are subject of XAI.

XAI is a subfield of AI that focuses on revealing the inner workings of black-box models in a way that humans can understand [5, 27, 37]. One approach involves associating semantic concepts from natural language with internal representations in the DNN’s latent space [37]. In computer vision, a semantic concept refers to an attribute that can describe an image or image region in natural language (e.g., “pedestrian head”, “green”) [10, 23]. These concepts can be associated with vectors in the CNN’s latent space, also known as concept activation vectors (CAVs) [23]. Post-hoc CA involves acquiring and processing CAVs from trained CNNs [2, 23, 30], which can be used to quantify how concepts attribute to CNN outputs and apply it to verification of safety [35] or fairness [23]. However, in literature two paradigms of post-hoc CA have so far been considered separately, even though they need to be combined to fully compare CNN learned concepts against prior human knowledge. These paradigms are: supervised CA, which investigates pre-defined concept representations [10, 23, 35], and unsupervised CA, which retrieves learned concepts [11, 49] and avoids expensive labeling costs. Furthermore, current XAI approaches are primarily designed and evaluated for small classification and regression tasks [2, 35], whereas more complex object detectors as used in automated driving require scalable XAI methods that can explain specific detections instead of just a single classification output.

Besides adaptation to object detection use-cases, high-stakes applications like safety-critical perception have high demands regarding the quality and reliability of verification tooling [19, Chap. 11]. A particular problem is stability: One should obtain similar concept representations given the same CNN, provided concept definitions, and probing data. Instable representations that vary strongly with factors like CA initialization weights [31] or imperceptible changes of the input [40] must be identified and only very cautiously used. Stability issues may arise both in the retrieval of the concept representations, as well as in their usage. Retrieval instability was already identified as an issue in the base work [23], and may lead to concept representations of different quality or even different semantic meaning for the same concept. Instability in usage may especially occur when determining local concept-to-output attribution. In particular, the baseline approach proposed by Kim et al. [23] uses sensitivity, which is known to be brittle with respect to slight changes in the input [40, 41].

This work tackles the aforementioned problems of OD-ready supervised and unsupervised CA, and measurement and improvement of stability in CA retrieval and attribution. Concretely, to solve these problems, we propose an XAI framework based on supervised and unsupervised CA methods for ODs. The unsupervised method is used to automatically mine concept samples, which are jointly used for supervised concept analysis with manually labeled concepts. Furthermore, stability metrics are suggested and tested. The respective main contributions of our work are:

Proposal of two metrics and methodology for testing of concept retrieval stability and concept attribution stability in CA;
Experimental study of stability influence factors in six diverse CNN models with different backbones with the main findings that CAV dimensionality reduction may improve stability, and that gradient smoothing may be beneficial for concept attribution stability in shallow layers;
Adaptation of supervised and unsupervised concept-based analysis methods for CA on common ODs;
Introduction of a post-hoc, label-efficient, concept-based explainability framework for classifiers and ODs allowing for concept stability estimation (Fig. 1).

In the following, we will first take a look at related work on concept analysis in Sect. 2. Our approaches for combining supervised and unsupervised CA, for CA in OD, and for stability measurement are then detailed in Sect. 3. Our experimental setup can be found in Sect. 4 with results detailed in Sect. 5.

2 Related Work

This section presents an overview of relevant supervised and unsupervised CA methods. Comprehensive XAI and CA surveys can be found in [4, 36, 44].

2.1 Supervised Concept Analysis

There are two primary paradigms in supervised CA methods: scalar-concept representation [6, 25, 34] and vector-concept representation [3, 10, 23]. Scalar concept representations refer to disentangled deep neural network (DNN) layer representations with a one-to-one correspondence between neurons and distinct semantic concepts. A prominent example and base work are Concept Bottleneck Models [25] (CBM). These introduce an interpretable bottleneck layer to DNNs by assigning each neuron to a specific concept, i.e., scalar-concepts. An extension CBM-AUC [34], enhances the model’s capability by automatically learning unsupervised concepts (AUC) that describe the residual variance of the feature space. In contrast to the previous examples, Concept Whitening [6] is a post-hoc approach towards scalar-concepts. It transforms a feature space of a layer and reduces redundancy between neurons, making it more likely for each neuron to correspond to a single concept. IIN [9] is another post-hoc approach that trains an invertible neural network to map a layer output to a disentangled version, using pairwise labels. However, standard CNNs are typically highly entangled [22]. Hence, such scalar-concept approaches have to enforce the disentangled structure during training or utilize potentially non-faithful proxies [29]. Furthermore, they are limited to explaining a single layer.

Vector-concepts, on the other hand, associate a concept with a vector in the latent space. The base work in this direction still disregarded the distributed nature of CNN representations: The Network Dissection approach [3] aims to associate each convolutional filter in a CNN with a semantic concept. Its successor Net2Vec [10] corrects this issue by associating a concept with a linear combination of filters, resulting in a concept being globally represented by a vector in the feature space, the concept activation vector (CAV) [23]. A sibling state-of-the-art method for associating concepts with latent space vectors is TCAV [23], which also uses a linear model attached to a CNN layer to distinguish between neurons (in contrast to filters as in Net2Vec) relevant to a given concept and the rest. TCAV also proposes a gradient-based approach that allows for the evaluation of how sensitive a single prediction or complete class is to a concept. The concept sensitivity (attribution) for a model prediction is calculated by taking the dot product between the concept activation vector and the gradient vector backpropagated for the desired prediction. These vector-concept baselines for classification (TCAV) and segmentation (Net2Vec) of concepts have been extended heavily over the years, amongst others towards regression concepts [14, 15], multi-class concepts [21], and locally linear [46, 47] and non-linear [21] CAV retrieval. However, the core idea remained untouched.

While the TCAV paper already identifies stability as a potential issue, they reside to significance tests for large series of experiments leaving a thorough analysis of stability (both for concept retrieval and concept attribution) open, as well as investigation of improvement measures. Successor works tried to stabilize the concept attribution measurement. For example, Pfau et al. [30] do not use the gradient directly, but the average change of the output when perturbing the intermediate output towards the CAV direction in latent space in different degrees. This gradient stabilization approach follows the idea of Integrated Gradients [41], but no other approaches like Smoothed Gradients [40] have been tried. Other approaches also suggest improved metrics for global concept attribution [15]. However, to our knowledge, stability remained unexplored so far.

We address this gap by utilizing TCAV as a baseline global concept vector representation for the stability estimation. Moreover, as gradient-based method, it be adapted to estimate concept attributions in other model types, such as ODs (see Sect. 3.2). It is important to note that our stability assessment method is not limited to TCAV and can potentially be applied to evaluate the stability of other global concept representations.

2.2 Unsupervised Concept Analysis

Unsupervised methods for analyzing concepts are also referred to as concept mining [36]. These methods do not rely on pre-defined concept labels, but the acquired concepts are not always meaningful and require manual revision. There are two main approaches to concept mining: clustering and dimensionality reduction. Clustering methods, such as ACE [12] and VRX [11] group latent space representations of image patches (superpixels), obtained through segmentation algorithms. The resulting clusters are treated as separate concepts and can be used for supervised concept analysis. Invertible Concept Extraction (ICE) [49] is a dimensionality reduction method based on non-negative matrix factorization. It mines non-negative concept activation vectors (NCAVs) corresponding to the most common patterns from sample activations in intermediate layers of a CNN. The resulting NCAVs are used to map sample activations to concept saliency maps, which show concept-related regions in the input space.

To reduce the need in concept labeling, we opted to use ICE for unsupervised concept mining due to (1) its superior performance regarding interpretability and completeness of mined concepts compared to clustering [49], and (2) its simpler and more straightforward pipeline with less hyperparameters. Unlike ACE, it does not rely on segmentation and clustering results as an intermediate step, which makes it easier to apply.

2.3 Concept Analysis in Object Detection

There are only a few existing works that apply concept analysis methods to object detection, due to scalability issues. In [35] the authors adapt Net2Vec for scalability to OD activation map sizes, which is later used to verify compliance of the CNN behavior with fuzzy logical constraints [38]. Other TCAV-based works apply lossy average pooling to allow large CAV sizes [7, 14], but do not test OD CNNs. However, these methods are fully supervised and require expensive concept segmentation maps for training, resulting in scalability issues regarding concept label needs. In order to reduce the need for concept labels, we propose adapting and using a jointly supervised and unsupervised classification approach for object detection, and investigate the impact of CAV size on stability. This also closes the gap that, to our knowledge, no unsupervised CA method has been applied to OD-sized CNNs so far.

3 Proposed Method

The overall goal targeted here is a CA framework that allows stable, label-efficient retrieval and usage of interpretable concepts for explainability of both classification and OD backbones. To address this, we introduce a framework that combines unsupervised CA (for semi-automated enrichment of the available concept pool) with supervised CA (for retrieval of CAVs and CNN evaluation) together with an assessment strategy for its stability properties. An overview of the framework is given in the following in Sect. 3.1, with details on how we adapted CA for OD in Sect. 3.2. Section 3.3 then presents our proposal of CAV stability metrics. Lastly, one of the potential influence factors on stability, namely CAV dimensionality and parameter reduction techniques, is presented in Sect. 3.4.

3.1 Stability Evaluation Framework

The framework depicted in Fig. 1 aims to efficiently combine supervised and unsupervised CA methods for use in explainability or evaluation purposes, like our CA stability evaluation. To achieve this it (1) builds an extensible Concept Pool containing human-validated Mined Concepts extracted from trained Model Under Test, and (optionally) existing manually Labeled Concepts; and it (2) uses these concepts to obtain CAVs and, e.g., conduct CAV Stability and Concept Attribution Stability tests on object detection and classification models.

Concept Pool Creation/Extension. In some CV domains, it can be challenging to find publicly available datasets with high-quality concept labels. In order to streamline the manual annotation process and speed up concept labeling, we utilize unsupervised concept mining. The left side of Fig. 1 depicts the process of creating the Concept Pool (or extending it, if we already have an initial set of Labeled Concepts) by employing the Concept Miner. A concept in the concept pool is represented by a set of images or image patches showing the concept. To extract additional Mined Concepts, the Concept Miner identifies image patches that cause common patterns in the CNN Image Activations. The activations are extracted from the layer of interest of the Backbone of the Model Under Test for Input Images from the mining set. In our work, we utilize ICE [49] as the Concept Miner to obtain the image patches. The workflow of ICE is as follows: (1) it first mines NCAVs; then, for each NCAV and each sample from a test set (2) it applies NCAV inference, i.e., obtains a (non-binary) heatmap of where the NCAV activates in the image, and (3) masks the input image with the binarized heatmap. For details see Sect. 2.2 and [49]. The sets of mined image patches, alias concepts, next undergo Manual Concept Validation: A human annotator assigns a label to each Mined Concepts. These Interpreted Concepts, if meaningful, can either directly be added to the set of Labeled Concepts or be utilized in Synthetic Concept Generation to obtain more complex synthetic concept samples (see Sect. 4.4 and Fig. 3 for more details and visual examples). It should be noted that the Concept Pool, once established, is model-agnostic and can be reused for other models, and that the ICE concept mining approach can be exchanged by any other suitable unsupervised CA method that produces concept heatmaps during inference.

Concept Stability Analysis. Now that the Concept Pool is established, we can perform supervised CA to obtain CAVs for the concepts in the pool. The CAV training is done on the Concept Activations, i.e., CNN activations of concept images from the Labeled Concepts in the Concept Pool. Given CAVs, we can then calculate per-sample concept attribution using, e.g., backpropagation-based sensitivity methods [23]. The resulting CAVs and Concept Sensitivity Scores can then be used for local and global explanation purposes. To ensure their quality, this work investigates stability (CAV Stability and Concept Attribution Stability) of these for OD use-cases, as detailed in Sect. 3.3.

For supervised CA we use the base TCAV [23] approach: A binary linear classifier is trained to predict presence of a concept from the intermediate neuron activations in the selected CNN layer. The classifier weights serve as CAV, namely the vector that points into the direction of the concept in the latent space. The CAVs are trained in a one-against-all manner on the labeled concept examples from the Concept Pool. For concept attribution, we adopt the sensitivity score calculation from [23]: for a sample is the partial derivative of the CNN output in the direction of the concept, which is calculated as the dot product between the CAV and the gradient vector in the CAV layer. In this paper, we are interested in the stability of this retrieval process for obtaining CAVs and respective concept attributions.

3.2 Concept Analysis in Object Detectors

The post-hoc concept stability assessment framework described above, in particular the used TCAV and ICE methods, is out-of-the-box suitable for use with classification models. However, object detection networks pose additional challenges: besides larger sizes, they have different prediction heads and employ suppressive post-processing of the output.

Multiple Predictions. Unlike classification models that produce a single set of predictions per sample, object detectors may produce multiple predictions, requiring adaptions to TCAV and ICE.

For ICE the concept weights and importance estimation component require adjustments. The pipeline assesses the effect of small modifications to each concept on the final class prediction. For classification, this estimation is performed on a per-sample basis. For object detection, we switch that calculation to the per-bounding box approach.

The TCAV process of calculating CAVs remains unchanged. However, TCAV employs gradients backpropagated from the corresponding class neuron and concept CAV to assess the concept sensitivity of the desired output class. In object detectors, concept sensitivity can be computed for each prediction, or bounding box, by starting the backpropagation from the desired class neuron of the bounding box.

It is important to note that some object detection architectures predict an objectness score for each bounding box, which can serve as an alternative starting neuron for the backpropagation [24]. Nonetheless, we only use class neurons for this purpose in our experiments.

Suppressive Post-processing. Another challenge in object detection is explanation of False Negatives (FNs), which refer to the absence of detection for a desired object. Users may be especially interested in explanations regarding FN areas, e.g., for debugging purposes. While the raw OD CNN bounding box predictions usually cover all image areas, post-processing may filter out bounding boxes due to low prediction certainty or suppress them during Non-Maximum Suppression (NMS). To still evaluate concept sensitivity for FNs, we compare the list of raw unprocessed bounding boxes with the desired object bounding boxes specified by the user. We then use Intersection over Union (IoU) to select the best raw bounding boxes that match the desired ones, and these selected bounding boxes (i.e., their output neurons) are used for further evaluation.

3.3 Evaluation of Concept Stability

Concept Retrieval Stability. We are interested in concepts that are both consistent and separable in the latent space. However, these two traits have not been considered jointly in previous work. Thus, we define the generalized concept stability $\mathcal {S}_{L_{k}}$ metric for a concept C in layer $L_{k}$ applicable to a test set X as

$$\begin{aligned} \mathcal {S}_{L_{k}}^C(X) \,{:}{=}\, \texttt {separability}_{L_{k}}^C(X) \times \texttt {consistency}_{L_{k}}^C, \end{aligned}$$

(1)

where, $\texttt {separability}_{L_{k}}^C(X)$ represents how well tested concepts are separated from each other in the feature space, $\texttt {consistency}_{L_{k}}^C$ denotes how similar are representations for the same concept when obtained with different initialization conditions.

Separability. The binary classification performance of each CAV reflects how effectively the concept is separated from other concepts, when evaluated in a concept-vs-other manner rather than a concept-vs-random approach. In the concept-vs-other scenario, the non-concept-class consists of all other concepts, whereas it is a single randomly selected other concept in the concept-vs-random scenario [23]. We choose the separability from Eq. 1 for a single concept C on the test set X as:

$$\begin{aligned} \textstyle \texttt {separability}_{L_{k}}^C(X) \,{:}{=}\, f1_{L_{k}}^C(X) \,{:}{=}\, \frac{1}{N} \sum _{i=1}^N f1(CAV_{L_k,i}^C; X) \in [0,1] \end{aligned}$$

(2)

where $f1_{L_{k}}^C$ is the mean of relative F1-scores $f1(-;X)$ on X for $\text {CAV}_{L_k,i}^C$ of C in layer $L_k$ for N runs i with different initialization conditions for CAV training.

Consistency. In TCAV, during the CAVs training, a limited amount of concept samples may lead to model underfitting, and significant inconsistency between CAVs obtained for different training samples and initialization conditions [23]. Since cosine similarity was shown to be a suitable similarity measure for CAVs [10, 23] we set the consistency measure to the mean cosine similarity between the CAVs in layer $L_k$ of N runs:

$$\begin{aligned} \texttt {consistency}_{L_{k}}^C \,{:}{=}\, \cos _{L_{k}}^C \,{:}{=}\, \tfrac{2}{N(N-1)}\sum _{i=1}^{N} \sum _{j=1}^{i-1} {\cos (\text {CAV}_{L_k,i}^C, \text {CAV}_{L_k,j}^C)} \,, \end{aligned}$$

(3)

where $\cos (-,-)$ is cosine similarity, here between CAVs of the same concept C and layer $L_k$ obtained during different runs i, j.

Concept Attribution Stability. Small changes in the input space may significantly change the output and, thus, the gradient values. TCAV requires gradients to calculate the concept sensitivity (attribution) of given prediction. Hence, gradient instability may have an impact on the explanations, and, in the worst case, change it from positive to negative attribution or vice versa.

We want to check, if such instability of gradient values influences concept detection. For this, we compare the vanilla gradient approach against a stabilized version using the state-of-the-art gradient stabilization approach SmoothGrad [40]. It diminishes or negates the gradient instability in neural networks by averaging vanilla gradients obtained for multiple copies of the original sample augmented with a minor random noise. For comparison purposes, first the vanilla gradient is propagated backward with respect to the detected object’s class neuron. This neuron is remembered and used then for the gradient backpropagation for noisy copies of SmoothGrad. TCAV concept attributions can naturally be generalized to Smoothgrad, defining them as:

$$\begin{aligned} \text {attr}_{C}^{*}(x) \,{:}{=}\, \text {CAV}_{C} \circ \nabla ^{*} f_{L_k\rightarrow }(f_{\rightarrow L_k}(x)) \;, \end{aligned}$$

(4)

where $\text {attr}_{C}^{*}$ is the attribution of concept C in layer $L_k$ for vanilla gradient ($*=\text {grad}$) or SmoothGrad ($*=\text {SG}$) for a single prediction for sample x, $\text {CAV}_C=\text {CAV}_{L_k,.}^C$, and $f_{\rightarrow L_k}$ is the CNN part up to $L_k$, $f_{L_k\rightarrow }$ the mapping from $L_k$ representations to the score of the selected prediction and class.

Acc. As one approach, for each tested layer we build a confusion matrix for multiple test samples and bounding boxes therein, where $y_\text {true} = \text {sign}(\text {attr}_{i}^\text {grad})$ and $y_{\text {predicted}} = \text {sign}(\text {attr}_{i}^\text {SG})$ are predictions to compare the sign of concept attribution for SmoothGrad and vanilla gradient. On this, accuracy (Acc) is used to show the fraction of cases where SmoothGrad and vanilla gradient concept attributions have the same sign, i.e., where gradient instability has no impact.

CAD. As a second approach, to qualitatively evaluate the difference between the concept attribution of SmoothGrad and the vanilla gradient in the tested layer, we introduce the Concept Attribution Deviation (CAD) metric. It shows the average absolute attribution value change for all used concepts C and N runs, and, thus, describes the impact of gradient instability on concept attribution in a layer:

$$\begin{aligned} \text {CAD}(x) \,{:}{=}\, \frac{ \textstyle \sum _{C}\sum _{i}^{N} \left| \text {attr}_{C,i}^\text {grad}(x) - \text {attr}_{C,i}^\text {SG}(x) \right| }{ \textstyle \sum _{C}\sum _{i}^{N} \left| \text {attr}_{C,i}^\text {grad}(x) \right| } \;. \end{aligned}$$

(5)

3.4 CAV Dimensionality

The stability can be greatly affected by the number of CAV parameters, which is especially important in object detectors with large intermediate representations. Moreover, the larger CAV size leads to increased memory and computation requirements. The original TCAV paper proposes using 3D-CAV-vectors [23]. However, alternative translation invariant 1D- [10, 49] and channel invariant 2D-CAV-representations, which have less parameters, are possible. If 3D-CAV’s dimensions of OD’s arbitrary intermediate layer are $C \times H \times W$, then dimensions of 1D- and 2D-CAV are $C \times 1 \times 1$ and $1 \times H \times W$ respectively, where C, H and, W denote channel, height and, width dimensions respectively (see Fig. 2).

The 1D-CAV provides during inference one presence score per channel, and possesses the property of translation invariance. This implies that only the presence or absence of a concept in the input space matters, rather than its size or location. In contrast, the 2D-CAV concentrates solely on the location of the concept, providing one presence score for each activation map pixel location. This can also be advantageous in certain circumstances (e.g., for the concepts “sky” or “ground”). The 3D-CAV provides during inference a single concept presence score for the complete image, depending both on location, size, and filter distribution of the concept. Meanwhile, it comes with the disadvantage of larger size and higher computational requirements.

Original 3D-CAVs do not require special handling of the latent space. But for evaluation of 1D- and 2D-CAVs, we preprocess incoming latent space vectors to match the CAV dimensionality by taking the mean along width and height, or channel dimensions respectively, as already successfully applied in previous work [7, 14]. In other words, for the calculation of CAV with reduced dimensions, we aggregate activation functions and gradients along certain dimensions. CAV dimension size is a hyperparameter, which may impact CAV memory consumption, CAV stability, the overall performance of concept separation, CAV training speed, and following operations with CAVs (e.g., evaluation of the concept attribution). Thus, we also propose using our stability metrics for the selection of the optimal CAV dimension size.

4 Experimental Setup

We use the proposed framework to conduct the following experiments for OD and classification models: 1) evaluation of concept representation stability via the selection of representation dimensionality; 2) inspection of the impact of gradient instability in CNNs on concept attribution. The process of concept analysis in classifiers can be carried out using the default approaches proposed in the original papers [23, 49], and it does not require any special handling.

In the following subsections, we describe selected experimental datasets and concept data preparation, models, model layers, and hyperparameter choices. Experiment results and interpretation are described later in Sect. 5.

4.1 Datasets

Object Detection. For unsupervised concept mining in object detectors and experiments with ODs, we use the validation set of MS COCO 2017 [26] dataset, containing 5000 real world images with 2D object bounding box annotations, including many outdoor and urban street scenarios. We mine concepts from bounding boxes of person class with the area of at least 20000 pixels, so the mined concept images have reasonable size and can be visually analyzed by a human. The resulting subset includes more than 2679 bounding boxes of people in different poses and locations extracted from 1685 images.

Classification. For concept stability experiments with classification model, we use BRODEN [3] and CycleGAN Zebras [50] datasets. BRODEN contains more than 60,000 images image and pixel-wise annotations for almost 1200 concepts of 6 categories. CycleGAN Zebras contains almost 1500 images of zebras suitable for supervised concept analysis.

4.2 Models

To evaluate the stability of semantic representations in the CNNs of different architectures and generations, we selected three object detectors and three classification models with various backbones.

Object Detection Models:

one-stage YOLOv5s^{Footnote 1} [20] (residual DarkNet [16, 32] backbone);
two-stage FasterRCNN^{Footnote 2} [33] (inverted residual MobileNetV3 [17] backbone);
one-stage SSD^{Footnote 3} [28] (VGG [39] backbone).

All evaluated object detection models are pre-trained on MS COCO [26] dataset. The models are further referred to as YOLO5, RCNN, and SSD.

Classification Models:

residual ResNet50^{Footnote 4} [16];
compressed SqueezeNet1.1^{Footnote 5} [18]
inverted residual EfficientNet-B0^{Footnote 6} [42]

Classification models are pre-trained on ImageNet1k [8] dataset. The models are further referred to as ResNet, SqueezeNet, and EfficientNet.

Table 1. Shorthands $l_i$ of selected classification CNN intermediate layers for Concept Analysis (l=layer, b=block, f=features, squeeze=s).

Full size table

Table 2. Shorthands $l_i$ of selected OD CNN intermediate layers for Concept Analysis (b=block, f=features, e=extra, c=conv).

Full size table

4.3 Layer Selection for Concept Analysis

To identify any influence of the layer depth on extracted concept stability, we must analyze the latent space of DNNs across multiple layers. To accomplish this, we extract intermediate representations and concepts from ten intermediate convolutional layers of ODs and seven intermediate convolutional layers of classifiers. These layers are uniformly distributed throughout the backbones of CNNs. The names of the selected layers for each network are listed in Table 1 and Table 2, where each layer is identified by a symbolic name in the format of $l_x$, where x denotes the relative depth of the layer in the backbone (i.e., layers from $l_1$ to $l_7$ for classifiers and from $l_1$ to $l_{10}$ for ODs).

In experiments, we use semantic concepts of medium-level (e.g., composite shapes) or high-level (e.g., human body parts) abstraction (Sect. 4.4). Shallow layers are ignored, as they mostly recognize concepts of low-level abstraction (e.g., color, texture), whilst deeper layers recognize complex objects and their parts [45, 48].

4.4 Synthetic Concept Generation and Concept Selection

Object Detection. To conduct concept analysis experiments with object detectors, we generate synthetic concept samples using concept information extracted from MS COCO (see Fig. 1 and Sect. 3.1). We used ICE [49] to mine concept-related superpixels (image patches) from MS COCO bounding boxes of the person class that have an area of at least 20,000 pixels. Then, we visually inspected 30 mined concepts (10 for each following YOLO5 layer: 8.cv3.c, 9.cv1.c, and 10.c; see caption of Table 2 for notations) and selected 3 concepts semantically corresponding to labels “legs”, “head”, and “torso”. Interestingly, we found that several concepts (e.g., “head”, “legs”) were present in more than one layer. We only picked one of the concepts of the same type based on the subjective quality. For each selected concept, we save 100 concept-related superpixels using a concept mask binarization threshold of 0.5.

Examples of the MS COCO synthetic concepts can be seen in Fig. 3. To generate a synthetic concept sample of a size of $640 \times 480$ pixels, 1 to 5 concept-related superpixels are selected and placed on a background of random noise drawn from a uniform distribution (alternatively, images of natural environments can be used as a background). Additionally, random scaling is applied to the superpixels before placement with a random factor between 0.9 and 1.1.

Classification. We use labeled concepts “stripes”, “zigzags”, and “dots” from BRODEN dataset to analyze the stability of concept representation and attribution in classification models on the examples of zebra images from the CycleGAN dataset.

4.5 Experiment-Specific Settings

Experiment 1: CAV Stability and Dimensionality. We conduct CAV-stability experiments for 1D-, 2D-, and 3D-CAVs (see Sect. 3.4) with YOLO5, RCNN, SSD, ResNet, SqueezeNet, and EfficientNet models to measure the potential concept retrieval stability in different networks and setups. For stability measurement, the number N of CAV retrieval runs with different initialization parameters is set to 15, which is similar to the ensemble size in [31], as we observed it is a good trade-off regarding computational speed. In each run, we utilize 100 samples per concept, dividing them into 80 for concept extraction and 20 for validation (estimation of f1).

To further examine the influence of the number of concept training samples on CAV stability, we also test three additional setups with 20, 40, and 60 training concept samples. The test has been conducted for all six networks.

Experiment 2: Gradient Stability in Concept Detection. For gradient stability experiments, ResNet and YOLO5 are selected as models with the best CAV stability from Experiment 2. Moreover, we validate setups with 1D- and 3D-CAVs to see how gradient instability affects concept attribution in CAVs of different dimensionality. For the computation of SmoothGrad, we use the hyperparameter values recommended in [40]: the number of noisy copies N is set to 50, and the amount of applied Gaussian noise is set to 10%.

5 Experimental Results

Table 3. Stability of generated CAVs of different dimensions for YOLO5.

Full size table

Table 4. Stability of generated CAVs of different dimensions for RCNN.

Full size table

Table 5. Stability of generated CAVs of different dimensions for SSD.

Full size table

Table 6. Stability of generated CAVs of different dimensions for ResNet.

Full size table

Table 7. Stability of generated CAVs of different dimensions for SqueezeNet.

Full size table

Table 8. Stability of generated CAVs of different dimensions for EfficientNet.

Full size table

5.1 CAV Stability and Dimensionality

The CAV stability results for 1D-, 2D- and 3D-CAVs in different layers of YOLO5, RCNN, SSD, ResNet, SqueezeNet, and EfficientNet networks are presented in Tables 3 to 8. In addition, Figs. 4 to 9 visualize the impact of number of training concept samples on the overall stability of 1D-, 2D- and 3D-CAVs.

CAV Dimensionality Impact. 3D-CAVs are obtained without intermediate representation aggregation, and they demonstrate good concept separation (f1) that can sometimes even outperform that of 1D-CAVs. This is typical for classifiers, where, for instance, in all layers of ResNet (Table 6) f1 of 3D-CAVs is the highest. However, they for all models exhibit mediocre CAV consistency (cos), possibly due to the larger number of parameters and a relatively small number of training concept samples. Overall, 3D-CAVs are less stable than 1D-CAVs, but still can be used for CA.

In contrast, 2D-CAVs exhibit relatively high consistency (e.g., in Table 6, layers $l_5$, $l_6$, and $l_7$ have the top cos values for 2D-CAVs), but they have the worst concept separation (f1), as observed in all tables. As a result, the overall 2D-CAV stability in all models is the worst. In 2D-CAVs, no distinction is made between different channels in the latent space due to 3D-to-2D aggregation. The noticeable reduction of concept separation (f1) in 2D-CAVs reinforces the assumption made in other works (e.g., [3, 10]) that concept information is encoded in different convolutional filters or their linear combinations.

1D-CAVs achieve the best overall CAV-stability due to their (mostly) best consistency (cos) and good concept separation (f1). Moreover, 1D-CAVs have the advantage of fast computation speed since they have fewer parameters. These unique features of 1D-CAVs make them highly stable even in shallow layers, where other CAVs may experience low stability. For example, in Table 3, the stability of 1D-CAVs in layer $l_1$ $S_{L_k}=0.732$ is substantially higher than that of 2D- and 3D-CAVs, which are only 0.223 and 0.199, respectively.

Based on our empirical findings, we recommend using 1D-CAV as the default representation for most applications due to its superior overall stability. However, for safety-critical applications, we advise using our stability assessment methodology prior to CA.

Concept Abstraction Level Impact. In OD models, experiments are conducted with concepts of medium-to-high levels of abstraction (complex shapes and human body parts), which are usually detected in middle and deep layers of the network [45]. Thus, it is expected that there will be worse concept separation (f1) in shallow layers, and this has indeed been observed across all dimension sizes of CAVs (as shown in Tables 3–8).

However, this observation is not always valid for 2D-CAVs, as results have shown that concept separation drops in some deeper layers. For instance, in Table 4 $l_4$ and $l_7$ have f1 values 0.420 and 0.448, while for $l_1$ it is 0.530. Also, Table 4 shows that the increase of f1 for 2D-CAVs is not as high as it is for 1D- and 3D-CAVs. The range of f1 for 2D-CAVs is between 0.420 to 0.659, whereas for 3D-CAVs, it is between 0.536 to 0.941. These findings further support the hypothesis that concept information is encoded in linear combinations of convolutional filters [3, 10].

Impact of Number of CAV Training Samples. Figures 4 to 9 demonstrate that increasing the number of training concept images has a positive impact on the stability of CAV. However, labeling concepts is a time-consuming and expensive process. Therefore, we recommend using at least 40 to 60 concept-related samples for training each CAV. In most cases, the stability obtained with 80 samples is only marginally better than that obtained with 40 (see Fig. 8) or 60 samples (see Fig. 4 and Fig. 6).

CNN Architecture Impact. From Tables 3 to 8 we see that top CAV stability ($S_{L_k}$) values achieved by ODs and classifiers for CAVs trained on the same concept datasets are very similar. However, due to architectural differences, the top stability values are achieved at different relative layer depths. For example, the top stabilities for 1D-CAVs in YOLO5, RCNN, and SSD object detectors are achieved in layers $l_6$, $l_9$, and $l_6$, respectively, with corresponding values of 0.915, 0.882, and 0.909 (see Tables 3, 4, and 5). Similarly, the top stability values for 1D-CAVs for ResNet, SqueezeNet, and EfficientNet classifiers are achieved in layers $l_5$, $l_7$, and $l_6$, respectively, with corresponding values of 0.900, 0.876, and 0.885 (Table 6, 7, and 8). The same tables show that the layers with top stability values may vary for different sizes of CAV dimensions even within the same model (e.g., in Table 3, the YOLO5 top stabilities for 1D-, 2D-, and 3D-CAV are obtained in layers $l_6$, $l_9$, and $l_8$, respectively).

The CAV stability differences among inspected architectures can also be observed in Figs. 4 to 9. For example, in the case of 1D-CAV of ResNet (Fig. 7) and 1D- and 3D-CAVs of SqueezeNet (Fig. 8), we observe that the stability value quickly reaches its optimal values in the first one or two layers and remains similar in deeper layers. In other cases, such as 3D-CAV of SSD (Fig. 6) or all CAV dimensions of RCNN (Fig. 5), stability gradually increases with the relative depth of the layer. Finally, the stabilities of 1D- and 3D-CAVs of YOLO5 (Fig. 4) or 1D- and 3D-CAVs of EfficientNet (Fig. 9) grow until an optimal layer in the middle and slowly shrink after it.

Table 9. Gradient stability in layers of ResNet for 1D-CAV.

Full size table

Table 10. Gradient stability in layers of ResNet for 3D-CAV.

Full size table

Table 11. Gradient stability in layers of YOLO5 for 1D-CAV.

Full size table

Table 12. Gradient stability in layers of YOLO5 for 3D-CAV.

Full size table

5.2 Gradient Stability in Concept Detection

Based on the experimental results, it can be concluded that the negative impact of gradient instability on concept analysis using TCAV is minimal. The results presented in Tables 9 and 10 are based on 1500 concept attribution predictions (see Eq. 4) for 500 images and 3 concepts per image, for each tested layer of ResNet with 1D- and 3D-CAVs, respectively. Similarly, Tables 11 and 12 are built for each tested layer of YOLO5 with 1D- and 3D-CAVs, respectively, using 2136 concept attribution predictions for 712 bounding boxes and 3 concepts per bounding box.

SmoothGrad Impact. In the Tables 9 to 12, the relative depth of CNN backbone layers is increasing from left to right, while gradient backpropagation depth from outputs to CAV layer is increasing in right to left order. As expected, the gradient is becoming more unstable with backpropagation depth [40], resulting in higher $\text {CAD}$ values in shallow layers compared to deeper layers. The higher number of concept attribution sign flips is observed in shallow layers (see Sect. 3.3), where accuracy (Acc) values in those layers are low. These observations confirm the negative correlation between CAD and Acc, where CAD increases as Acc decreases. This suggests that gradient smoothing techniques, such as SmoothGrad, can have a higher impact on concept attribution values in shallow layers, where the gradient instability is higher.

Despite the negative correlation between CAD and Acc values, the overall accuracy values remain above 0.9 for all layers in the provided tables. The lowest accuracy value for ResNet of $\text {Acc}=0.90$ is observed in Table 10 for $l_1$. For YOLO5 the lowest $\text {Acc}=0.91$ is obtrained in $l_5$ (Table 12). This indicates that the sign of concept attribution is only changed for a minority of predictions across all tested networks and configurations. However, it is worth noting that CAD values can be high in shallow layers, for instance, $\text {CAD}=31.3\%$ at layer $l_1$ of Table 10, resulting in a higher rate of concept attribution sign flipping compared to deeper layers.

The use of SmoothGrad comes at a higher computational cost compared to vanilla gradient. It is more than N times (number of noisy copies) computationally expensive, and mostly impacts concept attribution in shallow and middle layers of networks. Therefore, it is advisable to use SmoothGrad when conducting concept analysis in shallow layers of networks with large backbones such as ResNet101 or ResNet152.

CAV Dimensionality Impact. The use of 1D-CAV representations generally results in lower CAD values than 3D-CAVs, typically with a difference of 2–3%. This behavior can be attributed to the higher stability of 1D-CAVs, which is in turn caused by the lower number of parameters. The observation is consistent across all layers of ResNet and the majority of YOLO5 layers, as shown in Tables 9 to 12. However, the dimensionality of CAV does not affect the behavior of gradient instability in other regards: CAD remains higher and Acc lower in shallow layers regardless of the CAV dimensionality.

6 Conclusion and Outlook

This study proposes a framework and metrics for evaluating the layer-wise stability of global vector representations in object detection and classification CNN models for explainability purposes. We introduced two stability metrics: concept retrieval stability and concept attribution stability. Also, we proposed adaptation methodologies for unsupervised CA and supervised gradient-based CA methods for combined, labeling-efficient application in object detection models.

Our concept retrieval stability metric jointly evaluates the consistency and separation in the feature space of concept semantic concept representations obtained across multiple runs with different initialization parameters. We used the TCAV method as an example to examine factors that affect stability and found that aggregated 1D-CAV representations offer the best performance. Furthermore, we determined that a minimum of 60 training samples per concept is necessary to ensure high stability in most cases.

The second metric, concept attribution stability, assesses the impact of gradient smoothing techniques on the stability of concept attribution. Our observations suggest that 1D-CAVs are more resistant to gradient instability, particularly in deep layers, and we recommend using gradient smoothing in shallow layers of deep network backbones.

Our work provides valuable quantitative insights into the robustness of concept representation, which can inform the selection of network layers and concept representations for CA in safety-critical applications. For future work, it will be interesting to apply the proposed approaches and metrics to alternative global concept vector representations and perform comparative analysis.

Notes

References

32, I.S.: ISO 26262-1:2018(En): Road Vehicles – Functional Safety – Part 1: Vocabulary (2018). https://www.iso.org/standard/68383.html
Abid, A., Yuksekgonul, M., Zou, J.: Meaningfully debugging model mistakes using conceptual counterfactual explanations. In: Proceedings of the 39th International Conference on Machine Learning, pp. 66–88. PMLR, June 2022
Google Scholar
Bau, D., Zhou, B., Khosla, A., Oliva, A., Torralba, A.: Network dissection: quantifying interpretability of deep visual representations. In: Proceedings of the IEEE Conference Computer Vision and Pattern Recognition, pp. 6541–6549 (2017)
Google Scholar
Bodria, F., Giannotti, F., Guidotti, R., Naretto, F., Pedreschi, D., Rinzivillo, S.: Benchmarking and survey of explanation methods for black box models. arXiv preprint arXiv:2102.13076 (2021)
Carvalho, D.V., Pereira, E.M., Cardoso, J.S.: Machine learning interpretability: a survey on methods and metrics. Electronics 8(8), 832 (2019). https://doi.org/10.3390/electronics8080832
Article Google Scholar
Chen, Z., Bei, Y., Rudin, C.: Concept whitening for interpretable image recognition. Nat. Mach. Intell. 2(12), 772–782 (2020)
Article Google Scholar
Chyung, C., Tsang, M., Liu, Y.: Extracting interpretable concept-based decision trees from CNNs. In: Proceedings of the 2019 ICML Workshop Human in the Loop Learning, vol. 1906.04664, June 2019. CoRR
Google Scholar
Deng, J., Dong, W., Socher, R., Li, L.J., Li, K., Fei-Fei, L.: Imagenet: a large-scale hierarchical image database. In: 2009 IEEE Conference on Computer Vision and Pattern Recognition, pp. 248–255. IEEE (2009)
Google Scholar
Esser, P., Rombach, R., Ommer, B.: A disentangling invertible interpretation network for explaining latent representations. In: Proceedings of the 2020 IEEE Conference on Computer Vision and Pattern Recognition, pp. 9220–9229. IEEE, June 2020. https://doi.org/10.1109/CVPR42600.2020.00924
Fong, R., Vedaldi, A.: Net2Vec: quantifying and explaining how concepts are encoded by filters in deep neural networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 8730–8738 (2018)
Google Scholar
Ge, Y., et al.: A peek into the reasoning of neural networks: interpreting with structural visual concepts. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 2195–2204 (2021)
Google Scholar
Ghorbani, A., Wexler, J., Zou, J.Y., Kim, B.: Towards automatic concept-based explanations. Adv. Neural Inf. Process. Syst. 32 (2019)
Google Scholar
Goodman, B., Flaxman, S.: European union regulations on algorithmic decision-making and a right to explanation. AI Mag. 38(3), 50–57 (2017). https://doi.org/10.1609/aimag.v38i3.2741
Graziani, M., Andrearczyk, V., Marchand-Maillet, S., Müller, H.: Concept attribution: explaining CNN decisions to physicians. Comput. Biol. Med. 123, 103865 (2020). https://doi.org/10.1016/j.compbiomed.2020.103865
Article Google Scholar
Graziani, M., Andrearczyk, V., Müller, H.: Regression concept vectors for bidirectional explanations in histopathology. In: Stoyanov, D., et al. (eds.) MLCN/DLF/IMIMIC -2018. LNCS, vol. 11038, pp. 124–132. Springer, Cham (2018). https://doi.org/10.1007/978-3-030-02628-8_14
Chapter Google Scholar
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 770–778 (2016)
Google Scholar
Howard, A., et al.: Searching for MobileNetV3. In: Proceedings of the IEEE/CVF International Conference on Computer Vision, pp. 1314–1324 (2019)
Google Scholar
Iandola, F.N., Han, S., Moskewicz, M.W., Ashraf, K., Dally, W.J., Keutzer, K.: Squeezenet: alexnet-level accuracy with 50x fewer parameters and $<$0.5 mb model size. arXiv preprint arXiv:1602.07360 (2016)
ISO/TC 22/SC 32: ISO 26262-8:2018(En): Road Vehicles — Functional Safety — Part 8: Supporting Processes, ISO 26262:2018(En), vol. 8. International Organization for Standardization, second edn., December 2018
Google Scholar
Jocher, G.: YOLOv5 in PyTorch, ONNX, CoreML, TFLite, October 2020. https://github.com/ultralytics/yolov5, https://doi.org/10.5281/zenodo.4154370
Kazhdan, D., Dimanov, B., Jamnik, M., Liò, P., Weller, A.: Now you see me (CME): concept-based model extraction. In: Proceedings of the 29th ACM International Conference Information and Knowledge Management Workshops. CEUR Workshop Proceedings, vol. 2699. CEUR-WS.org (2020)
Google Scholar
Kazhdan, D., Dimanov, B., Terre, H.A., Jamnik, M., Liò, P., Weller, A.: Is disentanglement all you need? Comparing concept-based & disentanglement approaches. CoRR abs/2104.06917 (2021)
Google Scholar
Kim, B., Wattenberg, M., Gilmer, J., Cai, C., Wexler, J., Viegas, F., et al.: Interpretability beyond feature attribution: quantitative testing with concept activation vectors (TCAV). In: International Conference on Machine Learning, pp. 2668–2677. PMLR (2018)
Google Scholar
Kirchknopf, A., Slijepcevic, D., Wunderlich, I., Breiter, M., Traxler, J., Zeppelzauer, M.: Explaining yolo: leveraging grad-cam to explain object detections. arXiv preprint arXiv:2211.12108 (2022)
Koh, P.W., et al.: Concept bottleneck models. In: International Conference on Machine Learning, pp. 5338–5348. PMLR (2020)
Google Scholar
Lin, T.-Y., et al.: Microsoft COCO: common objects in context. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8693, pp. 740–755. Springer, Cham (2014). https://doi.org/10.1007/978-3-319-10602-1_48
Chapter Google Scholar
Linardatos, P., Papastefanopoulos, V., Kotsiantis, S.: Explainable AI: a review of machine learning interpretability methods. Entropy 23(1), 18 (2021). https://doi.org/10.3390/e23010018
Article Google Scholar
Liu, W., et al.: SSD: single shot multibox detector. In: Leibe, B., Matas, J., Sebe, N., Welling, M. (eds.) ECCV 2016. LNCS, vol. 9905, pp. 21–37. Springer, Cham (2016). https://doi.org/10.1007/978-3-319-46448-0_2
Chapter Google Scholar
Margeloiu, A., Ashman, M., Bhatt, U., Chen, Y., Jamnik, M., Weller, A.: Do concept bottleneck models learn as intended? arXiv preprint arXiv:2105.04289 (2021)
Pfau, J., Young, A.T., Wei, J., Wei, M.L., Keiser, M.J.: Robust semantic interpretability: revisiting concept activation vectors. In: Proceedings of the 2021 ICML Workshop Human Interpretability in Machine Learning, April 2021. CoRR
Google Scholar
Rabold, J., Schwalbe, G., Schmid, U.: Expressive explanations of DNNs by combining concept analysis with ILP. In: Schmid, U., Klügl, F., Wolter, D. (eds.) KI 2020. LNCS (LNAI), vol. 12325, pp. 148–162. Springer, Cham (2020). https://doi.org/10.1007/978-3-030-58285-2_11
Chapter Google Scholar
Redmon, J., Farhadi, A.: YOLOv3: an incremental improvement. arXiv preprint arXiv:1804.02767 (2018)
Ren, S., He, K., Girshick, R., Sun, J.: Faster R-CNN: towards real-time object detection with region proposal networks. Adv. Neural Inf. Process. Syst. 28 (2015)
Google Scholar
Sawada, Y., Nakamura, K.: Concept bottleneck model with additional unsupervised concepts. IEEE Access 10, 41758–41765 (2022)
Article Google Scholar
Schwalbe, G.: Verification of size invariance in DNN activations using concept embeddings. In: Maglogiannis, I., Macintyre, J., Iliadis, L. (eds.) AIAI 2021. IAICT, vol. 627, pp. 374–386. Springer, Cham (2021). https://doi.org/10.1007/978-3-030-79150-6_30
Chapter Google Scholar
Schwalbe, G.: Concept embedding analysis: a review, March 2022. arXiv:2203.13909 [cs, stat]
Schwalbe, G., Finzel, B.: A comprehensive taxonomy for explainable artificial intelligence: a systematic survey of surveys on methods and concepts. Data Min. Knowl. Discov. (2023). https://doi.org/10.1007/s10618-022-00867-8
Schwalbe, G., Wirth, C., Schmid, U.: Concept embeddings for fuzzy logic verification of deep neural networks in perception tasks. arXiv preprint arXiv:2201.00572 (2022)
Simonyan, K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556 (2014)
Smilkov, D., Thorat, N., Kim, B., Viégas, F., Wattenberg, M.: Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825 (2017)
Sundararajan, M., Taly, A., Yan, Q.: Axiomatic attribution for deep networks. In: Proceedings of the 34th International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 70, pp. 3319–3328. PMLR (2017)
Google Scholar
Tan, M., Le, Q.: Efficientnet: rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning, pp. 6105–6114. PMLR (2019)
Google Scholar
Veale, M., Borgesius, F.Z.: Demystifying the draft EU artificial intelligence act-analysing the good, the bad, and the unclear elements of the proposed approach. Comput. Law Rev. Int. 22(4), 97–112 (2021)
Article Google Scholar
Vilone, G., Longo, L.: Classification of explainable artificial intelligence methods through their output formats. Mach. Learn. Knowl. Extr. 3(3), 615–661 (2021)
Article Google Scholar
Wang, D., Cui, X., Wang, Z.J.: Chain: concept-harmonized hierarchical inference interpretation of deep convolutional neural networks. arXiv preprint arXiv:2002.01660 (2020)
Wu, W., et al.: Towards global explanations of convolutional neural networks with concept attribution. In: Proceedings of the 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 8649–8658 (2020). https://doi.org/10.1109/CVPR42600.2020.00868
Zhang, Q., Wang, W., Zhu, S.C.: Examining CNN representations with respect to dataset bias. In: Proceedings of the 32nd AAAI Conference on Artificial Intelligence, pp. 4464–4473. AAAI Press (2018)
Google Scholar
Zhang, Q., Wu, Y.N., Zhu, S.C.: Interpretable convolutional neural networks. In: Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 8827–8836 (2018)
Google Scholar
Zhang, R., Madumal, P., Miller, T., Ehinger, K.A., Rubinstein, B.I.: Invertible concept-based explanations for CNN models with non-negative concept activation vectors. In: Proceedings of the AAAI Conference on Artificial Intelligence, pp. 11682–11690 (2021)
Google Scholar
Zhu, J.Y., Park, T., Isola, P., Efros, A.A.: Unpaired image-to-image translation using cycle-consistent adversarial networks. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 2223–2232 (2017)
Google Scholar

Download references

Acknowledgments

The research leading to these results is funded by the German Federal Ministry for Economic Affairs and Climate Action within the project “KI Wissen - Entwicklung von Methoden für die Einbindung von Wissen in maschinelles Lernen”. The authors would like to thank the consortium for the successful cooperation.

Author information

Authors and Affiliations

Continental AG, Hanover, Germany
Georgii Mikriukov, Gesina Schwalbe & Christian Hellert
Hochschule Anhalt, Bernburg, Germany
Georgii Mikriukov & Korinna Bade

Authors

Georgii Mikriukov
View author publications
You can also search for this author in PubMed Google Scholar
Gesina Schwalbe
View author publications
You can also search for this author in PubMed Google Scholar
Christian Hellert
View author publications
You can also search for this author in PubMed Google Scholar
Korinna Bade
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Georgii Mikriukov .

Editor information

Editors and Affiliations

Technological University Dublin, Dublin, Ireland
Luca Longo

Rights and permissions

Open Access This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

Reprints and permissions

Copyright information

About this paper

Cite this paper

Mikriukov, G., Schwalbe, G., Hellert, C., Bade, K. (2023). Evaluating the Stability of Semantic Concept Representations in CNNs for Robust Explainability. In: Longo, L. (eds) Explainable Artificial Intelligence. xAI 2023. Communications in Computer and Information Science, vol 1902. Springer, Cham. https://doi.org/10.1007/978-3-031-44067-0_26

Download citation

DOI: https://doi.org/10.1007/978-3-031-44067-0_26
Published: 21 October 2023
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-44066-3
Online ISBN: 978-3-031-44067-0
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

Evaluating the Stability of Semantic Concept Representations in CNNs for Robust Explainability

Abstract

Similar content being viewed by others

Playing to distraction: towards a robust training of CNN classifiers through visual explanation techniques

Computer Vision Explainability for Object Detection in Safety Surveillance

Expressive Explanations of DNNs by Combining Concept Analysis with ILP

Keywords

1 Introduction

2 Related Work

2.1 Supervised Concept Analysis

2.2 Unsupervised Concept Analysis

2.3 Concept Analysis in Object Detection

3 Proposed Method

3.1 Stability Evaluation Framework

3.2 Concept Analysis in Object Detectors

3.3 Evaluation of Concept Stability

3.4 CAV Dimensionality

4 Experimental Setup

4.1 Datasets

4.2 Models

4.3 Layer Selection for Concept Analysis

4.4 Synthetic Concept Generation and Concept Selection

4.5 Experiment-Specific Settings

5 Experimental Results

5.1 CAV Stability and Dimensionality

5.2 Gradient Stability in Concept Detection

6 Conclusion and Outlook

Notes

References

Acknowledgments

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this paper

Cite this paper

Download citation

Share this paper

Publish with us

Search

Navigation