Abstract
Unsupervised learning is a subfield of machine learning that focuses on learning the structure of data without making use of labels. This implies a different set of learning algorithms than those used for supervised learning, and consequently, also prevents a direct transposition of Explainable AI (XAI) methods from the supervised to the less studied unsupervised setting. In this chapter, we review our recently proposed ‘neuralizationpropagation’ (NEON) approach for bringing XAI to workhorses of unsupervised learning such as kernel density estimation and kmeans clustering. NEON first converts (without retraining) the unsupervised model into a functionally equivalent neural network so that, in a second step, supervised XAI techniques such as layerwise relevance propagation (LRP) can be used. The approach is showcased on two application examples: (1) analysis of spending behavior in wholesale customer data and (2) analysis of visual features in industrial and scene images.
Keywords
 Explainable AI
 Unsupervised learning
 Neural networks
Download chapter PDF
1 Introduction
Supervised learning has been in the spotlight of machine learning research and applications for the last decade, with deep neural networks achieving recordbreaking classification accuracy and enabling new machine learning applications [5, 15, 23]. The success of deep neural networks can be attributed to their ability to implement with their multiple layers, complex nonlinear functions in a compact manner [32]. Recently, a significant amount of work has been dedicated to make deep neural network models more transparent [13, 24, 40, 41], for example, by proposing algorithms that identify which input features are responsible for a given classification outcome. Methods such as layerwise relevance propagation (LRP) [3], guided backprop [47], and GradCAM [42], have been shown capable of quickly and robustly computing these explanations.
Unsupervised learning is substantially different from supervised learning in that there is no groundtruth supervised signal to match. Consequently, nonneural network models such as kernel density estimation or kmeans clustering, where the user controls the scale and the level of abstraction through a particular choice of kernel or feature representation, have remained highly popular. Despite the predominance of unsupervised machine learning in a variety of applications (e.g. [9, 22]), research on explaining unsupervised models has remained relatively sparse [18, 19, 25, 28, 30] compared to their supervised counterparts. Paradoxically, it might in fact be unsupervised models that most strongly require interpretability. Unsupervised models are indeed notoriously hard to quantitatively validate [51], and the main purpose of applying these models is often to better understand the data in the first place [9, 17].
In this chapter, we review the ‘neuralizationpropagation’ (NEON) approach we have developed in the papers [18,19,20] to make the predictions of unsupervised models, e.g. cluster membership or anomaly score, explainable. NEON proceeds in two steps: (1) the decision function of the unsupervised model is reformulated (without retraining) as a functionally equivalent neural network (i.e. it is ‘neuralized’); (2) the extracted neural network structure is then leveraged by the LRP method to produce an explanation of the model prediction. We review the application of NEON to kernel density estimation for outlier detection and kmeans clustering, as presented originally in [18,19,20]. We also extend the reviewed work with a new contribution: explanation of inlier detection, and we use the framework of random features [36] for that purpose.
The NEON approach is showcased on several practical examples, in particular, the analysis of wholesale customer data, imagebased industrial inspection, and analysis of scene images. The first scenario covers the application of the method directly to the raw input features, whereas the second scenario illustrates how the framework can be applied to unsupervised models built on some intermediate layer of representation of a neural network.
2 A Brief Review of Explainable AI
The field of Explainable AI (XAI) has produced a wealth of explanation techniques and types of explanation. They address the heterogeneity of ML models found in applications and the heterogeneity of questions the user may formulate about the model and its predictions. An explanation may take the form of a simple decision tree (or other intrinsically interpretable model) that approximates the model’s inputoutput relation [10, 29]. Alternatively, an explanation may be a prototype for the concept represented at the output of the model, specifically, an input example to which the model reacts most strongly [34, 45]. Lastly, an explanation may highlight what input features are the most important for the model’s predictions [3, 4, 7].
In the following, we focus on a wellstudied problem of XAI, which is how to attribute the prediction of an individual data point, to the input features [3, 4, 29, 37, 45, 48, 50]. Let us denote by \(\mathcal {X} = \mathcal {I}_1 \times \dots \times \mathcal {I}_d\) the input space formed by the concatenation of d input features (e.g. words, pixels, or sensor measurements). We assume a learned model \(f: \mathcal {X} \rightarrow \mathbb {R}\) (supervised or unsupervised), mapping each data point in \(\mathcal {X}\) to a realvalued score measuring the evidence for a class or some other predicted quantity. The problem of attribution can be abstracted as producing for the given function f a mapping \(\mathcal {E}_f : \mathcal {X} \rightarrow \mathbb {R}^d\) that associates to each input example a vector of scores representing the (positive or negative) contribution of each feature. Often, one requires attribution techniques to implement a conservation (or completeness) property, where for all \(\boldsymbol{x}\in \mathcal {X}\) we have \(\boldsymbol{1}^\top \mathcal {E}_f(\boldsymbol{x}) = f(\boldsymbol{x})\) i.e. for every data point the sum of explanation scores over the input features should match the function value.
2.1 Approaches to Attribution
A first approach, occlusionbased, consists of testing the function to explain against various occlusions of the input features [53, 54]. An important method of this family (and which was originally developed in the context of game theory) is the Shapley value [29, 43, 48]. The Shapley value identifies a unique attribution that satisfies some predefined set of axioms of an explanation, including the conservation property stated above. While the approach has strong theoretical underpinnings, computing the explanation however requires an exponential number of function evaluations (an evaluation for every subset of input features). This makes the Shapley value in its basic form intractable for any problem with more than a few input dimensions.
Another approach, gradientbased, leverages the gradient of the function, so that a mapping of the function value onto the multiple input dimensions is readily obtained [45, 50]. The method of integrated gradients [50], in particular, attributes the prediction to input features by integrating the gradient along a path connecting some reference point (e.g. the origin) to the data point. The method requires somewhere between ten and a hundred function evaluations, and satisfies the aforementioned conservation property. The main advantage of gradientbased methods is that, by leveraging the gradient information in addition to the function value, one no longer has to perturb each input feature individually to produce an explanation.
A further approach, surrogatebased, consists of learning a simple local surrogate model of the function which is as accurate as possible, and whose structure makes explanation fast and unambiguous [29, 37]. For example, when approximating the function locally with a linear model, e.g. \(g(\boldsymbol{x}) = \sum _{i=1}^d x_i w_i\), the output of that linear model can be easily decomposed to the input features by taking the individual summands. While explanation itself is fast to compute, training the surrogate model incurs a significant additional cost, and further care must be taken to ensure that the surrogate model implements the same decision strategy as the original model, in particular, that it uses the same input features.
A last approach, propagationbased, assumes that the prediction has been produced by a neural network, and leverages the neural network structure by casting the problem of explanation as performing a backward pass in the network [3, 42, 47]. The propagation approach is embodied by the Layerwise Relevance Propagation (LRP) method [3, 31]. The backward pass implemented by LRP consists of a sequence of conservative propagation steps where each step is implemented by a propagation rule. Let j and k be indices for neurons at layer l and \(l+1\) respectively, and assume that the function output \(f(\boldsymbol{x})\) has been propagated from the toplayer to layer \(l+1\). We denote the resulting attribution onto these neurons as the vector of ‘relevance scores’ \((R_k)_k\). LRP then defines ‘messages’ \(R_{j \leftarrow k}\) that redistribute the relevance \(R_k\) to neurons in the layer below. These messages typically have the structure \(R_{j \leftarrow k} = [z_{jk} / \sum _j z_{jk}] \cdot R_k\), where \(z_{jk}\) models the contribution of neuron j to activating neuron k. The overall relevance of neuron j is then obtained by computing \(R_j = \sum _k R_{j \leftarrow k}\). It is easy to show that application of LRP from one layer to the layer below is conservative. Consequently, the explanation formed by iterating the LRP propagation from the top layer to the input layer is therefore also conservative, i.e. \(\sum _i R_i = \dots = \sum _j R_j = \sum _k R_k = \dots = f(\boldsymbol{x})\). As a result, explanations satisfying the conservation property can be obtained within a single forward/backward pass, instead of multiple function evaluations, as it was the case for the approaches described above. The runtime advantage of LRP facilitates explanation of large models and datasets (e.g. GPU implementations of LRP can achieve hundreds of image classification explanations per second [1, 40]).
2.2 NeuralizationPropagation
Propagationbased explanation techniques such as LRP have a computational advantage over approaches based on multiple function evaluations. However, they assume a preexisting neural network structure associated to the prediction function. Unsupervised learning models such as kernel density estimation or kmeans, are a priori not neural networks. However, the fact that these models are not given as neural networks does not preclude the existence of a neural network that implements the same function. If such a network exists (neural network equivalents of some unsupervised models will be presented in Sects. 3 and 4), we can quickly and robustly compute explanations by applying the following two steps:

Step 1: The unsupervised model is ‘neuralized’, that is, rewritten (without retraining) as a functionally equivalent neural network.

Step 2: The LRP method is applied to the resulting neural network, in order to produce an explanation of the prediction of the original model.
These two steps are illustrated in Fig. 1. In practice, for the second step to work well, some restrictions must be imposed on the type of neurons composing the network. In particular neurons should have a clear directionality in their input space to ensure that meaningful propagation to the lower layer can be achieved. (We will see in Sects. 3 and 4, that this requirement does not always hold.) Hence, the ‘neuralized model’ must be designed under the double constraint of (1) replicating the decision function of the unsupervised model exactly, and (2) being composed of neurons that enable a meaningful redistribution from the output to the input features.
3 Kernel Density Estimation
Kernel density estimation (KDE) [35] is one of the most common methods for unsupervised learning. The KDE model (or variations of it) has been used, in particular, for anomaly detection [21, 26, 38]. It assumes an unlabeled dataset \(\mathcal {D} = (\boldsymbol{u}_1,\dots ,\boldsymbol{u}_N)\), and a kernel, typically the Gaussian kernel \(\mathbb {K}(\boldsymbol{x},\boldsymbol{x}') = \exp (\gamma \, \Vert \boldsymbol{x}\boldsymbol{x}'\Vert ^2)\). The KDE model predicts a new data point \(\boldsymbol{x}\) by computing:
The function \(\tilde{p}(\boldsymbol{x})\) can be interpreted as an (unnormalized) probability density function. From this score, one can predict inlierness or outlierness of a data point. For example, one can say that \(\boldsymbol{x}\) is more anomalous than \(\boldsymbol{x}'\) if the inequality \(\tilde{p}(\boldsymbol{x}) < \tilde{p}(\boldsymbol{x}')\) holds. In the following, we consider the task of neuralizing the KDE model so that its inlier/outlier predictions can be explained.
3.1 Explaining Outlierness
A first question to ask is why a particular example \(\boldsymbol{x}\) is predicted by KDE to be an outlier, more specifically, what features of this example contribute to outlierness. As a first step, we consider what is a suitable measure of outlierness. The function \(\tilde{p}(\boldsymbol{x})\) produced by KDE decreases with outlierness, and also saturates to zero even though outlierness continues to grow. A better measure of outlierness is given by [19]:
Unlike the function \(\tilde{p}(\boldsymbol{x})\), the function \(o(\boldsymbol{x})\) increases as the probability decreases. It also does not saturate as \(\boldsymbol{x}\) becomes more distant from the dataset. We now focus on neuralizing the outlier score \(o(\boldsymbol{x})\). We find that \(o(\boldsymbol{x})\) can be expressed as the twolayer neural network:
where \(\text {LME}_k^{\alpha }\{h_k\} = \frac{1}{\alpha } \log \big ( \frac{1}{N} \sum _{k=1}^N \exp (\alpha \, h_k)\big )\) is a generalized logmeanexp pooling. The first layer computes the square distance of the new example from each point in the dataset. The second layer can be interpreted as a soft minpooling. The structure of the outlier computation is shown for a onedimensional toy example in Fig. 2.
This structure is particularly amenable to explanation. In particular, redistribution of \(o(\boldsymbol{x})\) in the intermediate layer can be achieved by a soft argmin operation, e.g.
where \(\beta \) is a hyperparameter to be selected. Then, propagation on the input features can leverage the geometry of the distance function, by computing
The hyperparameter \(\epsilon \) in the denominator is a stabilization term that ‘dissipates’ some of the relevance when \(\boldsymbol{x}\) and \(\boldsymbol{u}_k\) coincide.
Referring back to Sect. 2.1 we want to stress that computing the relevance of input features with LRP has the same computational complexity as a single forward pass, and does not require to train an explainable surrogate model.
3.2 Explaining Inlierness: Direct Approach
In Sect. 3.1, we have focused on explaining what makes a given example an outlier. An equally important question to ask is why a given example \(\boldsymbol{x}\) is predicted by the KDE model to be an inlier. Inlierness is naturally modeled by the KDE output \(\tilde{p}(\boldsymbol{x})\). Hence we can define the measure of inlierness as \(\mathbbm {i}(\boldsymbol{x}) \triangleq \tilde{p}(\boldsymbol{x})\). An inspection of Eq. (1) suggests the following twolayer neural network:
The first layer performs a mapping on Gaussian functions at different locations, and the second layer performs an average pooling. We now consider the task of propagation. A natural way of redistributing in the top layer is in proportion to the activations. This gives us the scores
A decomposition of \(R_k\) on the input features is however difficult. Because the relevance \(R_k\) can be rewritten as a product:
and observing that the contribution \(R_k\) can be made nearly zero by perturbing any of the input features significantly, we can conclude that every input feature contributes equally to \(R_k\) and should therefore be attributed an equal share of it. Application of this strategy for every neuron k would result in an uniform redistribution of the score \(\mathbbm {i}(\boldsymbol{x})\) to the input features. The explanation would therefore be qualitatively always the same, regardless of the data point \(\boldsymbol{x}\) and the overall shape of the inlier function \(\mathbbm {i}(\boldsymbol{x})\). While uniform attribution may be a good baseline, we usually strive for a more informative explanation.
3.3 Explaining Inlierness: Random Features Approach
To overcome the limitations of the approach above, we explore a second approach to explaining inlierness, where the neuralization is based on a feature map representation of the KDE model. For this, we first recall that any kernelbased model also admits a formulation in terms of the feature map \(\varPhi (\boldsymbol{x})\) associated to the kernel, i.e. \(\mathbb {K}(\boldsymbol{x},\boldsymbol{x}') = \langle \varPhi (\boldsymbol{x}),\varPhi (\boldsymbol{x}')\rangle \). In particular Eq. (1) can be equivalently rewritten as:
i.e. the product in feature space of the current example and the dataset mean. Here, we first recall that there is no explicit finitedimensional feature map associated to the Gaussian kernel. However, such feature map can be approximated using the framework of random features [36]. In particular, for a Gaussian kernel, features can be sampled as
with \(\boldsymbol{\omega }_j \sim \mathcal {N}(\boldsymbol{\mu },\sigma ^2 I)\) and \(b_j \sim \mathcal {U}(0,2\pi )\), and where the mean and scale parameters of the Gaussian are \(\boldsymbol{\mu }=\boldsymbol{0}\) and \(\sigma = \sqrt{2 \gamma }\). The dot product \(\langle \widehat{\varPhi }(\boldsymbol{x}),\widehat{\varPhi }(\boldsymbol{x}') \rangle \) converges to the Gaussian kernel as more and more features are being drawn. In practice, we settle for a fixed number H of features. Injecting the random features in Eq. (2) yields the twolayer architecture:
where \(\mu _j = \frac{1}{N}\sum _{k=1}^N \sqrt{2}\cos (\boldsymbol{\omega }_j^\top \boldsymbol{u}_k + b_j)\) and with \((\boldsymbol{\omega }_j,b_j)_j\) drawn from the distribution given above. This architecture produces at its output an approximation of the true inlierness score \(\mathbbm {i}(\boldsymbol{x})\) which becomes increasingly accurate as H becomes large. Here, the first layer is a detection layer with a cosine nonlinearity, and the second layer performs average pooling. The structure of the neural network computation is illustrated on our onedimensional example in Fig. 3.
This structure of the inlierness computation is more amenable to explanation. In the top layer, the pooling operation can be attributed based on the summands. In order words, we can apply
for the first step of redistribution of \(\widehat{\mathbbm {i}}(\boldsymbol{x})\). More importantly, in the first layer, the random features have now a clear directionality (given by the vectors \((\boldsymbol{\omega }_j)_j\)), which we can use for attribution on the input features. In particular, we can apply the propagation rule:
Compared to the direct approach of Sect. 3.2, the explanation produced here assigns different scores for each input feature. Moreover, while the estimate of inlierness \(\widehat{\mathbbm {i}}(\boldsymbol{x})\) converges to the true KDE inlierness score \(\mathbbm {i}(\boldsymbol{x})\) as more random features are being drawn, we observe similar convergence for the explanation associated to the inlier prediction.
4 KMeans Clustering
Another important class of unsupervised models is clustering. Kmeans is a popular algorithm for identifying clusters in the data. The kmeans model represents each cluster c with a centroid \(\boldsymbol{\mu }_c \in \mathbb {R}^d\) corresponding to the mean of the cluster members. It assigns data onto clusters by first computing the distance between the data point and each cluster, e.g.
and chooses the cluster with the lowest distance \(d_c(\boldsymbol{x})\). Once the data has been clustered, it is often the case that we would like to gain understanding of why a given data point has been assigned to a particular cluster, either for validating a given clustering model or for getting novel insights on the cluster structure of the data.
4.1 Explaining Cluster Assignments
As a starting point for applying our explanation framework, we need to identify a function \(f_c(\boldsymbol{x})\) that represents well the assignment onto a particular cluster c, e.g. a function that is larger than zero when the data point is assigned to a given cluster, and less than zero otherwise.
The distance function \(d_c(\boldsymbol{x})\) on which the clustering algorithm is based is however not directly suitable for the purpose of explanation. Indeed, \(d_c(\boldsymbol{x})\) tends to be inversely related to cluster membership, and it also does not take into account how far the data point is from other clusters. In [18], it is proposed to contrast the assigned cluster with the competing clusters. In particular, kmeans cluster membership can be modeled as the difference of (squared) distances between the nearest competing cluster and the assigned cluster c:
The paper [18] shows that this contrastive strategy results in a twolayer neural network. In particular, Eq. (5) can be rewritten as the twolayer neural network:
where \(\boldsymbol{w}_k = 2 (\boldsymbol{\mu }_c  \boldsymbol{\mu }_k)\) and \(b_k = \Vert \boldsymbol{\mu }_k\Vert ^2  \Vert \boldsymbol{\mu }_c\Vert ^2\). The first layer is a linear layer that depends on the centroid locations and provides a clear directionality in input space. The second layer is a hard minpooling. Once the neural network structure of cluster membership has been extracted, we can proceed with explanation techniques such as LRP by first reversepropagating cluster evidence in the top layer (contrasting the given cluster with all cluster competitors) and then further propagating in the layer below. In particular, we first apply the soft argmin redistribution
where \(\beta \) is a hyperparameter to be selected. An advantage of the soft argmin over its hard counterpart is that this does not create an abrupt transition between nearest competing clusters, which would in turn cause nearly identical data points with the same cluster decision to result in a substantially different explanation. Finally, the last step of redistribution on the input features can be achieved by leveraging the orientation of linear functions in the first layer, and applying the redistribution rule:
Overall, these two redistribution steps provide us with a way of meaningfully attributing the cluster evidence onto the input features.
5 Experiments
We showcase the neuralization approaches presented above on two examples with two types of data: standard vector data representing wholesale customer spending behavior, and image data, more specifically, industrial inspection and scene images.
5.1 Wholesale Customer Analysis
Our first use case is the analysis of a wholesale customer dataset [11]. The dataset consists of 440 instances representing different customers, and for each instance, the annual consumption of the customer in monetary units (m.u.) for the categories ‘fresh’, ‘milk’, ‘grocery’, ‘frozen’, ‘detergents/paper’, ‘delicatessen’ is given. Two additional geographic features are also part of this dataset, however we do not include them in our experiment. We will place our focus on two particular data points with feature values shown in the table below:
Instance 338 has rather typical levels of spending across categories, in general slightly lower than average, but with high spending on frozen products. Instance 339 has more extreme spending with almost no spending on fresh products and detergents and very high spending on frozen products.
To get further insights into the data, we construct a KDE model on the whole data and apply our analysis to the selected instances. Each input feature is first mapped to the logarithm and standardized (mean 0 and variance 1). We choose the kernel parameter \(\gamma =1\). We use a leaveoneout approach where the data used to build the KDE model is the whole data except the instance to be predicted and analyzed. The number of random features is set to \(H = 2500\) such that the computational complexity of the inlier model stays within one order of magnitude to the original kernel model. Predictions on the whole dataset and analysis for the selected instances is shown in Fig. 4.
Instance 338 is predicted to be an inlier, which is consistent with our initial observation that the levels of spending across categories are on the lower end but remain usual. We can characterize this instance as a typical small customer. We also note that the feature ‘frozen’ contributes less to inlierness according to our analysis, probably due to the spending on that category being unusually high for a typical small customer.
Instance 339 has an inlierness score almost zero, which is consistent with the observation in Table 1 that spending behavior is extremal for multiple product categories. The decomposition of an inlierness score of almost zero on the different categories is rather uninformative, hence, for this customer, we look at what explains outlierness (bottom of Fig. 4). We observe as expected that categories where spending behavior diverges for this instance are indeed strongly represented in the explanation of outlierness, with ‘fresh’, ‘milk’, ‘frozen’ and ‘detergents/paper’ contributing almost all evidence for outlierness. Surprisingly, we observe that extremely low spending on ‘fresh’ is underrepresented in the outlierness score, compared to other categories such as ‘milk’ or ‘frozen’ where spending is less extreme. This apparent contradiction will be resolved by a cluster analysis.
Using the same logarithmic mapping and standardization step as for the KDE model, we now train a kmeans model on the data and set the number of clusters to 6. Training is repeated 10 times with different centroid initializations, and we retain the model that has reached the lowest kmeans objective. The outcome of the clustering is shown in Fig. 5 (left).
We observe that Instance 338 falls somewhere at the border between the green and red clusters, whereas Instance 339 is well into the yellow cluster at the bottom. The decomposition of cluster evidence for these two instances is shown on the right. Because Instance 338 is at the border between two clusters, there is no evidence of membership to one or another cluster, and the decomposition of such (lack of) evidence results in an explanation that is zero for all categories. The decomposition of the cluster evidence for Instance 339, however, reveals that its cluster membership is mainly due to a singular spending pattern on the category ‘fresh’. To shed further light into this decision, we look at the cluster to which this instance has been assigned, in particular, the average spending of cluster members on each category. This information is shown in Table 2.
We observe that this cluster is characterized by low spending on fresh products and delicatessen. It may be a cluster of small retailers that, unlike supermarkets, do not have substantial refrigeration capacity. Hence, the very low level of spending of Instance 339 on ‘fresh’ products puts it well into that cluster, and it also explains why the outlierness of Instance 339 is not attributed to ‘fresh’ but to other features (cf. Fig. 4). In particular, what distinguishes Instance 339 from its cluster is a very high level of spending on frozen products, and this is also the category that contributes the most to outlierness of this instance according to our analysis of the KDE model.
Traditionally, cluster membership has been characterized by more basic approaches such as population statistics of individual features (e.g. [8]). Figure 6 shows such analysis for Instances 338 and 339 of the Wholesale Customer Dataset. Although similar observations to the ones above can be made from this simple statistical analysis, e.g. the feature ‘frozen’ appears to contradict the membership of Instance 339 to Cluster 4, it is not clear from this simple analysis what makes Instance 339 a member of Cluster 4 in the first place. For example, while the feature ‘grocery’ of Instance 339 is within the inter quartile range (IQR) of Cluster 4 and can therefore be considered typical of that cluster, other clusters have similar IQRs for that feature. Moreover, Instance 339 falls significantly outside Cluster 4’s IQR for other features. In comparison, our LRP approach more directly and reliably explains the cluster membership and outlierness of the considered instances. Furthermore, population statistics of individual features may be misleading on nonlinear models (such as kernel clustering) and does not scale to highdimensional data, such as image data.
Overall, our analysis allows to identify on a singleinstance basis features that contribute to various properties relating this instance to the rest of the data, such as inlierness/outlierness and cluster membership. As our analysis has revealed, the insights that are obtained go well beyond a traditional data analysis based on looking at population statistics for individual features, or a simple inspection of unsupervised learning outcomes.
5.2 Image Analysis
Our next experiment looks at explanation of inlierness, outlierness, and cluster membership for image data. Unlike the example above, relevant image statistics are better expressed at a more abstract level than directly on the pixels. A popular approach consists of using a pretrained neural model (e.g. the VGG16 network [46]), and use the activations produced at a certain layer as input.
We first consider the problem of anomaly detection for industrial inspection and use for this an image of the MVTec AD dataset [6], specifically, an image of wood where an anomalous horizontal scratch can be observed. The image is shown in Fig. 7 (left). We feed that image to a pretrained VGG16 network and collect the activations at the output of Block 5 (i.e. at the output of the feature extractor). We consider each spatial location at the output of that block as a data point and build a KDE model (with \(\gamma =0.05\)) on the resulting dataset. We then apply our analysis to attribute the predicted inlierness/outlierness to the activations of Block 5. In practice, we need to consider the fact that any attribution on a deactivated neuron cannot be redistributed further to input pixels as there is no pattern in pixel space to attach to. Hence, the propagation procedure must be carefully implemented to address this constraint, possibly by only redistributing a limited share of the model output. The details are given in Appendix A. As a last step, we take relevance scores computed at the output of Block 5 and pursue the relevance propagation procedure in the VGG16 network using standard LRP rules until the pixels are reached. Explanations obtained for inlierness and outlierness of the wood image of interest are shown in Fig. 7.
It can be observed that pixels associated to regular wood stripes are the main contributors to inlierness. Instead, the horizontal scratch on the wood panel is a contributing factor for outlierness. Hence, with our explanation method, we can precisely identify, on a pixelwise basis what are the factors that contribute for/against predicted inlierness and outlierness.
We now consider some image of the SUN 2010 database [52], an indoor scene containing different pieces of furniture and home appliances. We consider the same VGG16 network as in the experiment above and build a dataset by collecting activations at each spatial location of the output of Block 5. We then apply the kmeans algorithm on this dataset with the number of clusters hardcoded to 5. Once the clustering model has been built, we rescale each cluster centroid to fixed norm. We then apply our analysis attribute the cluster membership scores to the activations at the output of Block 5. As for the industrial inspection example above, we must adjust the LRP rules so that deactivated neurons are not attributed relevance. The details of the LRP procedure are given in Appendix A. Obtained relevance scores are then propagated further to the input pixels using standard LRP rules. Resulting explanations are shown in Fig. 8.
We observe that different clusters identify distinct concepts. For example, one cluster focuses on the microwave oven and the surrounding cupboards, a second cluster represents the bottom part of the bar chairs, a third cluster captures the kitchen’s background with a particular focus on a painting on the wall, the fourth cluster captures various objects on the table and in the background, and a last cluster focuses on the toppart of the chairs. While the clustering representation extracts distinct humanrecognizable image features, it also shows some limits of the given representation, for example, the concept ‘bar chair’ is split in two distinct concepts (the bottom and top part of the chair respectively), whereas the clutter attached to Cluster 4 is not fully disentangled from the surrounding chairs and cupboards.
Overall, our experiments on image data demonstrate that neuralization of unsupervised learning models can be naturally integrated with existing procedures for explaining deep neural networks. This enables an application of our method to a broad range of practical problems where unsupervised modeling is better tackled at a certain level of abstraction and not directly in input space.
6 Conclusion and Outlook
In this paper, we have considered the problem of explaining the predictions of unsupervised models, in particular, we have reviewed and extended the neuralization/propagation approach of [18, 19] which consists of rewriting, without retraining, the unsupervised model as a functionally equivalent neural network, and applying LRP in a second step. On two models of interest, kernel density estimation and kmeans, we have highlighted a variety of techniques that can be used for neuralization. This includes the identification of logmeanexp pooling structures, the use of random features, and the transformation of a difference of (squared) distances into a linear layer. The capacity of our approach to deliver meaningful explanations was highlighted on two examples covering simple tabular data and images including their mapping on some layer of a convolutional network.
While our approach delivers good quality explanations at low computational cost, there are however still a number of open questions that remain to be addressed to further solidify the neuralizationpropagation approach, and the explanation of unsupervised models in general.
A first question concerns the applicability of our method to a broader range of practical scenarios. We have highlighted how neuralized models can be built not only in input space but also on some layer of a deep neural network, thereby bringing explanations to much more complex unsupervised models. However, there is a higher diversity of unsupervised learning algorithms that are encountered in practice, including energybased models [16], spectral methods [33, 44], linkage clustering [12], nonEuclidean methods [27], or prototypebased anomaly detection [14]. An important future work will therefore be to extend the proposed framework to handle this heterogeneity of unsupervised machine learning approaches.
Another question is that of validation. There are many possible LRP propagation rules that one can define in practice, as well as potentially multiple neural network reformulations of the same unsupervised model. This creates a need for reliable techniques to evaluate the quality of different explanation methods. While techniques to evaluate explanation quality have been proposed and successfully applied in the context of supervised learning (e.g. based on feature removal [39]), further care needs to be taken in the unsupervised scenario, in particular, to avoid that the outcome of the evaluation is spuriously affected by such feature removals. As an example, removing some feature responsible for some predicted anomaly may unintentionally cause some new artefact to be created in the data. That would in turn increase the anomaly score instead of lowering it as it was originally intended [19].
In addition to further extending and validating the neuralizationpropagation approach, one needs to ask how to develop these explanation techniques beyond their usage as a simple visualization or data exploration tool. For example, it remains to demonstrate whether these explanation techniques, in combination with user feedback, can be used to systematically verify and improve the unsupervised model at hand (e.g. as recently demonstrated for supervised models [2, 49]). Some initial steps have already been taken in this direction [20, 38].
References
Alber, M., et al.: iNNvestigate neural networks! J. Mach. Learn. Res. 20, 93:1–93:8 (2019)
Anders, C.J., Weber, L., Neumann, D., Samek, W., Müller, K.R., Lapuschkin, S.: Finding and removing Clever Hans: using explanation methods to debug and improve deep models. Inf. Fusion 77, 261–295 (2022)
Bach, S., Binder, A., Montavon, G., Klauschen, F., Müller, K.R., Samek, W.: On pixelwise explanations for nonlinear classifier decisions by layerwise relevance propagation. PLoS ONE 10(7), e0130140 (2015)
Baehrens, D., Schroeter, T., Harmeling, S., Kawanabe, M., Hansen, K., Müller, K.R.: How to explain individual classification decisions. J. Mach. Learn. Res. 11, 1803–1831 (2010)
Bahdanau, D., Cho, K., Bengio, Y.: Neural machine translation by jointly learning to align and translate. In: ICLR (2015)
Bergmann, P., Batzner, K., Fauser, M., Sattlegger, D., Steger, C.: The Mvtec anomaly detection dataset: a comprehensive realworld dataset for unsupervised anomaly detection. Int. J. Comput. Vis. 129(4), 1038–1059 (2021). https://doi.org/10.1007/s11263020014004
Blum, A., Langley, P.: Selection of relevant features and examples in machine learning. Artif. Intell. 97(1–2), 245–271 (1997)
Chapfuwa, P., Li, C., Mehta, N., Carin, L., Henao, R.: Survival cluster analysis. In: Ghassemi, M. (ed.) ACM Conference on Health, Inference, and Learning, pp. 60–68. ACM (2020)
Ciriello, G., Miller, M.L., Aksoy, B.A., Senbabaoglu, Y., Schultz, N., Sander, C.: Emerging landscape of oncogenic signatures across human cancers. Nat. Genet. 45(10), 1127–1133 (2013)
Craven, M.V., Shavlik, J.W.: Extracting treestructured representations of trained networks. In: NIPS, pp. 24–30. MIT Press (1995)
de Abreu, N.G.C.F.M.: Análise do perfil do cliente recheio e desenvolvimento de um sistema promocional. Master’s thesis, Instituto Universitário de Lisboa (2011)
Gower, J.C., Ross, G.J.S.: Minimum spanning trees and single linkage cluster analysis. Appl. Stat. 18(1), 54 (1969)
Guidotti, R., Monreale, A., Ruggieri, S., Turini, F., Giannotti, F., Pedreschi, D.: A survey of methods for explaining black box models. ACM Comput. Surv. 51(5), 93:1–93:42 (2019)
Harmeling, S., Dornhege, G., Tax, D., Meinecke, F., Müller, K.R.: From outliers to prototypes: ordering data. Neurocomputing 69(13–15), 1608–1618 (2006)
He, K., Zhang, X., Ren, S., Sun, J.: Deep residual learning for image recognition. In: CVPR, pp. 770–778. IEEE Computer Society (2016)
Hinton, G.E.: Training products of experts by minimizing contrastive divergence. Neural Comput. 14(8), 1771–1800 (2002)
Kau, A.K., Tang, Y.E., Ghose, S.: Typology of online shoppers. J. Consum. Mark. 20(2), 139–156 (2003)
Kauffmann, J.R., Esders, M., Montavon, G., Samek, W., Müller, K.R.: From clustering to cluster explanations via neural networks. CoRR, abs/1906.07633 (2019)
Kauffmann, J.R., Müller, K.R., Montavon, G.: Towards explaining anomalies: a deep Taylor decomposition of oneclass models. Pattern Recognit. 101, 107198 (2020)
Kauffmann, J.R., Ruff, L., Montavon, G., Müller, K.R.: The Clever Hans effect in anomaly detection. CoRR, abs/2006.10609 (2020)
Kim, J., Scott, C.D.: Robust kernel density estimation. J. Mach. Learn. Res. 13, 2529–2565 (2012)
Koren, Y., Bell, R.M., Volinsky, C.: Matrix factorization techniques for recommender systems. Computer 42(8), 30–37 (2009)
Krizhevsky, A., Sutskever, I., Hinton, G.E.: ImageNet classification with deep convolutional neural networks. In: NIPS, pp. 1106–1114 (2012)
Lapuschkin, S., Wäldchen, S., Binder, A., Montavon, G., Samek, W., Müller, K.R.: Unmasking Clever Hans predictors and assessing what machines really learn. Nat. Commun. 10(1096), 1–8 (2019)
Laskov, P., Rieck, K., Schäfer, C., Müller, K.R.: Visualization of anomaly detection using prediction sensitivity. In: Sicherheit, volume P62 of LNI, pp. 197–208. GI (2005)
Latecki, L.J., Lazarevic, A., Pokrajac, D.: Outlier detection with kernel density functions. In: Perner, P. (ed.) MLDM 2007. LNCS (LNAI), vol. 4571, pp. 61–75. Springer, Heidelberg (2007). https://doi.org/10.1007/9783540734994_6
Liu, F.T., Ting, K.M., Zhou, Z.: Isolation forest. In: Proceedings of the 8th IEEE International Conference on Data Mining, pp. 413–422. IEEE Computer Society (2008)
Liu, N., Shin, D., Hu, X.: Contextual outlier interpretation. In: IJCAI, pp. 2461–2467. ijcai.org (2018)
Lundberg, S.M., Lee, S.: A unified approach to interpreting model predictions. In: Advances in Neural Information Processing Systems, vol. 30, pp. 4765–4774 (2017)
Micenková, B., Ng, R.T., Dang, X., Assent, I.: Explaining outliers by subspace separability. In: ICDM, pp. 518–527. IEEE Computer Society (2013)
Montavon, G., Binder, A., Lapuschkin, S., Samek, W., Müller, K.R.: Layerwise relevance propagation: an overview. In: Samek, W., Montavon, G., Vedaldi, A., Hansen, L.K., Müller, K.R. (eds.) Explainable AI: Interpreting, Explaining and Visualizing Deep Learning. LNCS (LNAI), vol. 11700, pp. 193–209. Springer, Cham (2019). https://doi.org/10.1007/9783030289546_10
Montúfar, G.F., Pascanu, R., Cho, K., Bengio, Y.: On the number of linear regions of deep neural networks. In: NIPS, pp. 2924–2932 (2014)
Ng, A.Y., Jordan, M.I., Weiss, Y.: On spectral clustering: analysis and an algorithm. In: NIPS, pp. 849–856. MIT Press (2001)
Nguyen, A., Dosovitskiy, A., Yosinski, J., Brox, T., Clune, J.: Synthesizing the preferred inputs for neurons in neural networks via deep generator networks. In: NIPS, pp. 3387–3395 (2016)
Parzen, E.: On estimation of a probability density function and mode. Ann. Math. Stat. 33(3), 1065–1076 (1962)
Rahimi, A., Recht, B.: Random features for largescale kernel machines. In: NIPS, pp. 1177–1184 (2007)
Ribeiro, M.T., Singh, S., Guestrin, C.: “Why should I trust you?": Explaining the predictions of any classifier. In: KDD, pp. 1135–1144. ACM (2016)
Ruff, L., et al.: A unifying review of deep and shallow anomaly detection. Proc. IEEE 109(5), 756–795 (2021)
Samek, W., Binder, A., Montavon, G., Lapuschkin, S., Müller, K.R.: Evaluating the visualization of what a deep neural network has learned. IEEE Trans. Neural Netw. Learn. Syst. 28(11), 2660–2673 (2017)
Samek, W., Montavon, G., Lapuschkin, S., Anders, C.J., Müller, K.R.: Explaining deep neural networks and beyond: a review of methods and applications. Proc. IEEE 109(3), 247–278 (2021)
Samek, W., Montavon, G., Vedaldi, A., Hansen, L.K., Müller, K.R. (eds.): Explainable AI: Interpreting, Explaining and Visualizing Deep Learning. LNCS (LNAI), vol. 11700. Springer, Cham (2019). https://doi.org/10.1007/9783030289546
Selvaraju, R.R., Cogswell, M., Das, A., Vedantam, R., Parikh, D., Batra, D.: Gradcam: visual explanations from deep networks via gradientbased localization. Int. J. Comput. Vis. 128(2), 336–359 (2020)
Shapley, L.S.: 17. A value for nperson games. In: Contributions to the Theory of Games (AM28), vol. II. Princeton University Press (1953)
Shi, J., Malik, J.: Normalized cuts and image segmentation. IEEE Trans. Pattern Anal. Mach. Intell. 22(8), 888–905 (2000)
Simonyan, K., Vedaldi, A., Zisserman, A.: Deep inside convolutional networks: visualising image classification models and saliency maps. In: ICLR (Workshop Poster) (2014)
Simonyan, K., Zisserman, A.: Very deep convolutional networks for largescale image recognition. In: ICLR (2015)
Springenberg, J.T., Dosovitskiy, A., Brox, T., Riedmiller, M.: Striving for simplicity: the all convolutional net. In: ICLR (Workshop) (2015)
Strumbelj, E., Kononenko, I.: An efficient explanation of individual classifications using game theory. J. Mach. Learn. Res. 11, 1–18 (2010)
Sun, J., Lapuschkin, S., Samek, W., Binder, A.: Explain and improve: LRPinference fine tuning for image captioning models. Inf. Fusion 77, 233–246 (2022)
Sundararajan, M., Taly, A., Yan, Q.: Axiomatic attribution for deep networks. In: ICML, Proceedings of Machine Learning Research, vol. 70, pp. 3319–3328. PMLR (2017)
von Luxburg, U., Williamson, R.C., Guyon, I.: Clustering: science or art? In: ICML Unsupervised and Transfer Learning, JMLR Proceedings, vol. 27, pp. 65–80. JMLR.org (2012)
Xiao, J., Ehinger, K.A., Hays, J., Torralba, A., Oliva, A.: SUN database: exploring a large collection of scene categories. Int. J. Comput. Vis. 119(1), 3–22 (2016)
Zeiler, M.D., Fergus, R.: Visualizing and understanding convolutional networks. In: Fleet, D., Pajdla, T., Schiele, B., Tuytelaars, T. (eds.) ECCV 2014. LNCS, vol. 8689, pp. 818–833. Springer, Cham (2014). https://doi.org/10.1007/9783319105901_53
Zintgraf, L.M., Cohen, T.S., Adel, T., Welling, M.: Visualizing deep neural network decisions: prediction difference analysis. In: ICLR (Poster). OpenReview.net (2017)
Acknowledgements
This work was supported by the German Ministry for Education and Research under Grant 01IS14013AE, Grant 01GQ1115, Grant 01GQ0850, as BIFOLD (ref. 01IS18025A and ref. 01IS18037A) and Patho234 (ref. 031LO207), the European Union’s Horizon 2020 programme (grant no. 965221), and the German Research Foundation (DFG) as Math+: Berlin Mathematics Research Center (EXC 2046/1, projectID: 390685689). This work was supported in part by the Institute of Information & Communications Technology Planning & Evaluation (IITP) grants funded by the Korea Government under Grant 2017000451 (Development of BCI Based Brain and Cognitive Computing Technology for Recognizing User’s Intentions using Deep Learning) and Grant 2019000079 (Artificial Intelligence Graduate School Program, Korea University).
Author information
Authors and Affiliations
Corresponding author
Editor information
Editors and Affiliations
A Attribution on CNN Activations
A Attribution on CNN Activations
Propagation rules mentioned in Sects. 3 and 4 are not suited for identifying relevant neurons at some layer of a neural network when the goal is to propagate the relevance further down the layers of the neural network, e.g. to obtain a pixelwise explanation. What we need to ensure in such scenario is that all relevant information is expressed in terms of activated neurons as they are the only ones for which the associated relevance can be grounded to a specific pattern in the pixel space. One possible approach is to decompose the relevance propagation into a propagating term and a nonpropagating (or ‘dissipating’) one, which leads to a partial (although still useful) explanation. In the following, we describe the approaches we have taken to achieve our extension of explanations to deep models.
1.1 A.1 Attributing Outlierness
The activations in the first layer of the neuralized outlier model is
and the relevance that arrives on the corresponding neuron is given by \(R_k = p_k \text {LME}_{k'}^{\gamma }\{h_{k'}\}\) with \(p_k = \frac{\exp (\beta h_k)}{\sum _{k'} \exp (\beta h_{k'})}\). Relevance associated to neuron k can be expressed as:
where we have used the commutativity of the LME function and the distributivity of the squared norm to decompose the relevance in two terms, one that can be meaningfully redistributed on the activations, and one that cannot be redistributed. Redistribution in the first layer can then proceed as:
It is easy to demonstrate from this equation that any neuron with \(a_i=0\) (i.e. deactivated) will not be attributed any relevance.
1.2 A.2 Attributing Inlierness
Neurons in the first layer of the inlierness model based on random features, have activations given by:
and relevance scores \(R_j = h_j / H\). Using a simple trigonometric identity, we can rewrite the relevance scores in terms of unphased sine and cosine functions as:
where \(c_j = \frac{1}{H}\sqrt{2} \mu _j\). We propose the redistribution rule:
where \(\epsilon _j\) is a term set to be of same sign as the denominator, and that addresses the case where a positive \(R_j^{\cos }\) comes with a nearzero response \(\boldsymbol{\omega }_j^\top \boldsymbol{a}\), by ‘dissipating’ some of the relevance \(R_j^{\cos }\).
1.3 A.3 Attributing Cluster Membership
The activation in the first layer of the neuralized cluster membership model is:
and the relevance score is given by \(R_k = p_k \min _{k' \ne c}\{h_{k'}\}\) with \(p_k = \frac{\exp (\beta h_k)}{\sum _{k'} \exp (\beta h_{k'})}\). Similar to the outlier case, we decompose the relevance score as:
and only consider the first term for propagation. Specifically, we apply the propagation rule:
where it can again be shown that only activated neurons are attributed relevance.
Rights and permissions
Open Access This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.
The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.
Copyright information
© 2022 The Author(s)
About this chapter
Cite this chapter
Montavon, G., Kauffmann, J., Samek, W., Müller, KR. (2022). Explaining the Predictions of Unsupervised Learning Models. In: Holzinger, A., Goebel, R., Fong, R., Moon, T., Müller, KR., Samek, W. (eds) xxAI  Beyond Explainable AI. xxAI 2020. Lecture Notes in Computer Science(), vol 13200. Springer, Cham. https://doi.org/10.1007/9783031040832_7
Download citation
DOI: https://doi.org/10.1007/9783031040832_7
Published:
Publisher Name: Springer, Cham
Print ISBN: 9783031040825
Online ISBN: 9783031040832
eBook Packages: Computer ScienceComputer Science (R0)