A Rate-Distortion Framework for Explaining Black-Box Model Decisions

Kolek, Stefan; Nguyen, Duc Anh; Levie, Ron; Bruna, Joan; Kutyniok, Gitta

doi:10.1007/978-3-031-04083-2_6

Stefan Kolek¹⁴,
Duc Anh Nguyen¹⁴,
Ron Levie¹⁶,
Joan Bruna¹⁵ &
…
Gitta Kutyniok¹⁴

Part of the book series: Lecture Notes in Computer Science ((LNAI,volume 13200))

Included in the following conference series:

International Workshop on Extending Explainable AI Beyond Deep Models and Classifiers

21k Accesses
5 Citations

Abstract

We present the Rate-Distortion Explanation (RDE) framework, a mathematically well-founded method for explaining black-box model decisions. The framework is based on perturbations of the target input signal and applies to any differentiable pre-trained model such as neural networks. Our experiments demonstrate the framework’s adaptability to diverse data modalities, particularly images, audio, and physical simulations of urban environments.

You have full access to this open access chapter, Download chapter PDF

Cartoon Explanations of Image Classifiers

Is My Neural Net Driven by the MDL Principle?

Model-Based Machine Learning and Approximate Inference

1 Introduction

Powerful machine learning models such as deep neural networks are inherently opaque, which has motivated numerous explanation methods that the research community developed over the last decade [1, 2, 7, 15, 16, 20, 26, 29]. The meaning and validity of an explanation depends on the underlying principle of the explanation framework. Therefore, a trustworthy explanation framework must align intuition with mathematical rigor while maintaining maximal flexibility and applicability. We believe the Rate-Distortion Explanation (RDE) framework, first proposed by [16], then extended by [9], as well as the similar framework in [2], meets the desired qualities. In this chapter, we aim to present the RDE framework in a revised and holistic manner. Our generalized RDE framework can be applied to any model (not just classification tasks), supports in-distribution interpretability (by leveraging in-painting GANs), and admits interpretation queries (by considering suitable input signal representations).

The typical setting of a (local) explanation method is given by a pre-trained model $\varPhi :\mathbb {R}^n\rightarrow \mathbb {R}^m,$ and a data instance $x\in \mathbb {R}^n$. The model $\varPhi $ can be either a classification task with m class labels or a regression task with m-dimensional model output. The model decision $\varPhi (x)$ is to be explained. In the original RDE framework [16], an explanation for $\varPhi (x)$ is a set of feature components $S\subset \left\{ 1,\ldots ,n\right\} $ in x that are deemed relevant for the decision $\varPhi (x)$. The core principle behind the RDE framework is that a set $S\subset \left\{ 1,\ldots ,n\right\} $ contains all the relevant components if $\varPhi (x)$ remains (approximately) unchanged after modifying $x_{S^c}$, i.e., the components in x that are not deemed relevant. In other words, S contains all relevant features if they are sufficient for producing the output $\varPhi (x)$. To convey concise explanatory information, one aims to find the minimal set $S\subset \left\{ 1,\ldots ,n\right\} $ with all the relevant components. As demonstrated in [16] and [31], the minimal relevant set $S\subset \left\{ 1,\ldots ,n\right\} $ cannot be found combinatorically in an efficient manner for large input sizes. A meaningful approximation can nevertheless be found by optimizing a sparse continuous mask $s\in [0,1]^n$ that has no significant effect on the output $\varPhi (x)$ in the sense that $\varPhi (x)\approx \varPhi (x\odot s + (1-s)\odot v)$ should hold for appropriate perturbations $v\in \mathbb {R}^n$, where $\odot $ denotes the componentwise multiplication. Suppose $d\big (\varPhi (x),\varPhi (y)\big )$ is a measure of distortion (e.g. the $\ell _2$-norm) between the model outputs for $x,y\in \mathbb {R}^n$ and $\mathcal {V}$ is a distribution over appropriate perturbations $v\sim \mathcal {V}$. An explanation in the RDE framework can be found as a solution mask $s^*$ to the following minimization problem:

$$\begin{aligned} s^* := \quad \mathop {\text {arg min}}\limits _{s\in [0,1]^n} \mathop {\mathbb {E}}_{v\sim \mathcal {V}}\Bigg [d\Big (\varPhi (x),\varPhi (x\odot s + (1-s)\odot v)\Big )\Bigg ] + \lambda \Vert s\Vert _1, \end{aligned}$$

where $\lambda >0$ is a hyperparameter controlling the sparsity of the mask.

We further generalize the RDE framework to abstract input signal representations $x=f(h)$, where f is a data representation function with input h. The philosophy of the generalized RDE framework is that an explanation for generic input signals $x=f(h)$ should be some simplified version of the signal, which is interpretable to humans. This is achieved by demanding sparsity in a suitable representation system h, which ideally optimally represents the class of explanations that are desirable for the underlying domain and interpretation query. This philosophy underpins our experiments on image classification in the wavelet domain, on audio signal classification in the Fourier domain, and on radio map estimation in an urban environment domain. Therein we demonstrate the versatility of our generalized RDE framework.

2 Related Works

To our knowledge, the explanation principle of optimizing a mask $s\in [0,1]^n$ has been first proposed in [7]. Fong et al. [7] explained image classification decisions by considering one of the two “deletion games”: (1) optimizing for the smallest deletion mask that causes the class score to drop significantly or (2) optimizing for the largest deletion mask that has no significant effect on the class score. The original RDE approach [16] is based on the second deletion game and connects the deletion principle to rate-distortion-theory, which studies lossy data compression. Deleted entries in [7] were replaced with either constants, noise, or blurring and deleted entries in [16] were replaced with noise.

Explanation methods introduced before the “deletion games” principle from [7] were typically based upon gradient-based methods [26, 29], propagation of activations in neurons [1, 25], surrogate models [20], and game-theory [15]. Gradient-based methods such as smoothgrad [26] suffer from a lacking principle of relevance beyond local sensitivity. Reference-based methods such as Integrated Gradients [29] and DeepLIFT [25] depend on a reference value, which has no clear optimal choice. DeepLIFT and LRP assign relevance by propagating neuron activations, which makes them dependent on the implementation of $\varPhi $. LIME [20] uses an interpretable surrogate model that approximates $\varPhi $ in a neighborhood around x. Surrogate model explanations are inherently limited for complex models $\varPhi $ (such as image classifiers) as they only admit very local approximations. Generally, explanations that only depend on the model behavior on a small neighborhood $U_x$ of x offer limited insight. Lastly, Shapley values-based explanations [15] are grounded in Shapley values from game-theory. They assign relevance scores as weighted averages of marginal contributions of respective features. Though Shapley values are mathematically well-founded, relevance scores cannot be computed exactly for common input sizes such as $n\ge 50$, since one exact relevance score generally requires $O(2^n)$ evaluations of $\varPhi $ [30].

A notable difference between the RDE method and additive feature explanations [15] is that the values in the mask $s^*$ do not add up to the model output. The additive property as in [15] takes the view that features individually contribute to the model output and relevance should be reflected by their contributions. We emphasize that the RDE method is designed to look for a set of relevant features and not an estimate of individual relative contributions. This is particularly desirable when only groups of features are interpretable, as for example in image classification tasks, where individual pixels do not carry any interpretable meaning. Similarly to Shapley values, the explanation in the RDE framework cannot be computed exactly, as it requires solving a non-convex minimization problem. However, the RDE method can take full advantage of modern optimization techniques. Furthermore, the RDE method is a model-agnostic explanation technique, with a mathematically principled and intuitive notion of relevance as well as enough flexibility to incorporate the model behavior on meaningful input regions of $\varPhi $.

The meaning of an explanation based on deletion masks $s\in [0,1]^n$ depends on the nature of the perturbations that replace the deleted regions. Random [7, 16] or blurred [7] replacements $v\in \mathbb {R}^n$ may result in a data point $x\odot s + (1-s)\odot v$ that falls out of the natural data manifold on which $\varPhi $ was trained on. This is a subtle though important problem, since such an explanation may depend on evaluations of $\varPhi $ on data points from undeveloped decision regions. The latter motivates in-distribution interpretability, which considers meaningful perturbations that keep $x\odot s + (1-s)\odot v$ in the data manifold. [2] was the first work that suggested to use an inpainting-GAN to generate meaningful perturbations to the “deletion games”. The authors of [9] then applied in-distribution interpretability to the RDE method in the challenging modalities music and physical simulations of urban environments. Moreover, they demonstrated that the RDE method in [16] can be extended to answer so-called “interpretation queries”. For example, the RDE method was applied in [9] to an instrument classifier to answer the global interpretation query “Is magnitude or phase in the signal more important for the classifier?”. Most recently, in [11], we introduced CartoonX as a novel explanation method for image classifiers, answering the interpretation query “What is the relevant piece-wise smooth part of an image?” by applying RDE in the wavelet basis of images.

3 Rate-Distortion Explanation Framework

Based on the original RDE approach from [16], in this section, we present a general formulation of the RDE framework and discuss several implementations. While [16] focuses merely on image classification with explanations in pixel representation, we will apply the RDE framework not only to more challenging domains but also to different input signal representations. Not surprisingly, the combinatorical optimization problem in the RDE framework, even in simpler form, is extremely hard to solve [16, 31]. This motivates heuristic solution strategies, which will be discussed in Subsect. 3.2.

3.1 General Formulation

It is well-known that in practice there are different ways to describe a signal $x \in \mathbb {R}^n$. Generally speaking, x can be represented by a data representation function $f:\prod _{i=1}^k\mathbb {R}^{d_i}\rightarrow \mathbb {R}^n$,

$$\begin{aligned} x = f(h_1, \ldots , h_k), \end{aligned}$$

(1)

for some inputs $h_i \in \mathbb {R}^{d_i}$, $d_i \in \mathbb {N}$, $i \in \left\{ 1, \ldots , k\right\} $, $k\in \mathbb {N}$. Note, we do not restrict ourselves to linear data representation functions f. To briefly illustrate the generality of this abstract representation, we consider the following examples.

Example 1 (Pixel representation)

An arbitrary (vectorized) image $x \in \mathbb {R}^n$ can be simply represented pixelwise

$$\begin{aligned} x = \begin{bmatrix} x_1 \\ \vdots \\ x_n\end{bmatrix} = f(h_1, \ldots , h_n), \end{aligned}$$

with $h_i := x_i$ being the individual pixel values and $f :\mathbb {R}^n \rightarrow \mathbb {R}^n$ being the identity transform.

Due to its simplicity, this standard basis representation is a reasonable choice when explaining image classification models. However, in many other applications, one requires more sophisticated representations of the signals, such as through a possibly redundant dictionary.

Example 2

Let $\left\{ \psi _j\right\} _{j =1}^k$, $k \in \mathbb {N}$, be a dictionary in $\mathbb {R}^n$, e.g., a basis. A signal $x \in \mathbb {R}^n$ is represented as

$$\begin{aligned} x = \sum _{j =1}^k h_j \psi _j, \end{aligned}$$

where $h_j\in \mathbb {R}$, $j \in \left\{ 1, \ldots , k\right\} $, are appropriate coefficients. In terms of the abstract representation (1), we have $d_j = 1$ for $j \in \left\{ 1, \ldots , k\right\} $ and f is the function that yields the weighted sum over $\psi _j$. Note that Example 1 can be seen as a special case of this representation.

The following gives an example of a non-linear representation function f.

Example 3

Consider the discrete inverse Fourier transform, defined as

$$\begin{aligned}&f: \prod _{j=1}^{n}\mathbb {R}_+ \times \prod _{j=1}^n[0,2\pi ] \rightarrow \mathbb {C}^n,\\ {}&\big [f(m_1,...,m_n,\omega _1,...,\omega _n)\big ]_l:= \frac{1}{n} \sum _{j=1}^{n} \underbrace{m_je^{i\omega _j}}_{:= c_j\in \mathbb {C}}e^{i2\pi l(j-1)/n}, \; l \in \left\{ 1, \ldots , n\right\} , \end{aligned}$$

where $m_j$ and $\omega _j$ are respectively the magnitude and the phase of the j-th discrete Fourier coefficient $c_j$. Thus every signal $x \in \mathbb {R}^n \subseteq \mathbb {C}^n$ can be represented in terms of (1) with f being the discrete inverse Fourier transform while $h_{j}$, $j=1,\ldots ,k$ (with $k=2n$) being specified as $m_{j'}$ and $\omega _{j'}$, $j' = 1, \ldots , n$.

Further examples of dictionaries $\left\{ \psi _j\right\} _{j=1}^k$ include the discrete wavelet [21], cosine [19] or shearlet [12] representation systems and many more. In these cases, the coefficients $h_i$ are given by the forward transform and f is referred to as the backward transform. Note that in the above examples we have $d_i = 1$, i.e., the input vectors $h_i$ are real-valued. In many situations, one is also interested in representations $x = f(h_1, \ldots , h_k)$ with $h_i \in \mathbb {R}^{d_i}$ where $d_i >1$.

Example 4

Let $k=2$ and define f again as the discrete inverse Fourier transform, but as a function of two components: (1) the entire magnitude spectrum and (2) the entire frequency spectrum, namely

$$\begin{aligned}&f: \mathbb {R}_+^n \times [0,2\pi ]^n,\\&\big [f(m,\omega )\big ]_l := \frac{1}{n} \sum _{j=1}^{n} \underbrace{m_j e^{i\omega _j}}_{:= c_n\in \mathbb {C}}e^{i2\pi l(j-1)/n},\; l \in \left\{ 1, \ldots , n\right\} . \end{aligned}$$

Similarly, instead of individual pixel values, one can consider patches of pixels in an image $x \in \mathbb {R}^n$ from Example 1 as the input vectors $h_i$ to the identity transform f. We will come back to these examples in the experiments in Sect. 4.

Finally, we would like to remark that our abstract representation

$$x = f(h_1,\ldots ,h_k)$$

also covers the cases where the signal is the output of a decoder or generative model f with inputs $h_1, \ldots , h_k$ as the code or the latent variables.

As was discussed in previous sections, the main idea of the RDE framework is to extract the relevant features of the signal based on the optimization over its perturbations defined through masks. The ingredients of this idea are formally defined below.

Definition 1 (Obfuscations and expected distortion)

Let $\varPhi :\mathbb {R}^n\rightarrow \mathbb {R}^m$ be a model and $x\in \mathbb {R}^n $ a data point with a data representation $x =f(h_1,...,h_k)$ as discussed above. For every mask $s\in [0,1]^k$, let $\mathcal {V}_s$ be a probability distribution over $\prod _{i=1}^k\mathbb {R}^{d_i}$. Then the obfuscation of x with respect to s and $\mathcal {V}_s$ is defined as the random vector

$$\begin{aligned} y := f(s\odot h + (1-s)\odot v), \end{aligned}$$

where $v\sim \mathcal {V}_s$, $(s\odot h)_i = s_i h_i\in \mathbb {R}^{d_i}$ and $((1-s)\odot v)_i= (1-s_i)v_i\in \mathbb {R}^{d_i}$ for $i\in \left\{ 1, \ldots , k\right\} $. Furthermore, the expected distortion of x with respect to the mask s and the perturbation distribution $\mathcal {V}_s$ is defined as

$$ D(x,s,\mathcal {V}_s, \varPhi ):= \mathop {\mathbb {E}}_{v\sim \mathcal {V}_s} \Bigg [ d\Big (\varPhi (x), \varPhi (y)\Big )\Bigg ], $$

where $d:\mathbb {R}^m\times \mathbb {R}^m\rightarrow \mathbb {R}_+$ is a measure of distortion between two model outputs.

In the RDE framework, the explanation is given by a mask that minimizes distortion while remaining relatively sparse. The rate-distortion-explanation mask is defined in the following.

Definition 2 (The RDE mask)

In the setting of Definition 1 we define the RDE mask as a solution $s^*(\ell )$ to the minimization problem

$$\begin{aligned} \min _{s\in \{0,1\}^k} \quad D(x,s,\mathcal {V}_s, \varPhi ) \quad \text { s.t. } \quad \left\| s \right\| _0 \le \ell , \end{aligned}$$

(2)

where $\ell \in \left\{ 1, \ldots , k\right\} $ is the desired level of sparsity.

Here, the RDE mask is defined as the binary mask that minimizes the expected distortion while keeping the sparsity smaller than a certain threshold. Besides this, one could obviously also define the RDE mask as the sparsest binary mask that keeps the distortion lower than a given threshold, as defined in [16]. Geometrically, one can interpret the RDE mask as a subspace that is stable under $\varPhi $. If $x=f(h)$ is the input signal and s is the RDE mask for $\varPhi (x)$ on the coefficients h, then the associated subspace $R_\varPhi (s)$ is defined as the space of feasible obfuscations of x with s under $\mathcal {V}_s$, i.e.,

$$\begin{aligned} R_\varPhi (s) :=\{f(s\odot h + (1-s)\odot v)\;|\;v\in \text {supp}\mathcal {V}_s \}, \end{aligned}$$

where $\text {supp}\mathcal {V}_s$ denotes the support of the distribution $\mathcal {V}_s$. The model $\varPhi $ will act similarly on signals in $R_\varPhi (s)$ due to the low expected distortion $ D(x,s,\mathcal {V}_s, \varPhi )$—making the subspace stable under $\varPhi $. Note that RDE directly optimizes towards a subspace that is stable under $\varPhi $. If, instead, one would choose the mask s based on information of the gradient $\nabla \varPhi (x)$ and Hessian $\nabla ^2\varPhi (x)$, then only a local neighborhood around x would tend to be stable under $\varPhi $ due to the local nature of the gradient and Hessian. Before discussing practical algorithms to approximate the RDE mask in Subsect. 3.2, we will review frequently used obfuscation strategies, i.e., the distribution $\mathcal {V}_s$, and measures of distortion.

3.1.1 Obfuscation Strategies and in-Distribution Interpretability

The meaning of an explanation in RDE depends greatly on the nature of the perturbations $v\sim \mathcal {V}_s$. A particular choice of $\mathcal {V}_s$ defines an obfuscation strategy. Obfuscations are either in-distribution, i.e., if the obfuscation $ f(s\odot h + (1-s)\odot v) $ lies on the natural data manifold that $\varPhi $ was trained on, or out-of-distribution otherwise. Out-of-distribution obfuscations pose the following problem. The RDE mask (see Definition 2) depends on evaluations of $\varPhi $ on obfuscations $f(s\odot h + (1-s)\odot v)$. If $f(s\odot h + (1-s)\odot v)$ is not on the natural data manifold that $\varPhi $ was trained on, then it may lie in undeveloped regions of $\varPhi $. In practice, we are interested in explaining the behavior of $\varPhi $ on realistic data and an explanation can be corrupted if $\varPhi $ did not develop the region of out-of distribution points $f(s\odot h + (1-s)\odot v)$. One can guard against this by choosing $\mathcal {V}_s$ so that $f(s\odot h + (1-s)\odot v)$ is in-distribution. Choosing $\mathcal {V}_s$ in-distribution boils down to modeling the conditional data distribution – a non-trivial task.

Example 5 (In-distribution obfuscation strategy)

In light of the recent success of generative adversarial networks (GANs) in generative modeling [8], one can train an in-painting GAN [32]

$$ G(h,s,z)\in \prod _{i=1}^k \mathbb {R}^{d_i}, $$

where z are random latent variables of the GAN, such that the obfuscation $f\big (s\odot h + (1-s)\odot G(h,s,z) \big )$ lies on the natural data manifold (see also [2]). In other words, one can choose $\mathcal {V}_s$ as the distribution of $v:= G(h,s,z)$, where the randomness comes from the random latent variables z.

Example 6 (Out-of-distribution obfuscation strategies)

A very simple obfuscation strategy is Gaussian noise. In that case, one defines $\mathcal {V}_s$ for every $s\in [0,1]^k$ as $ \mathcal {V}_s:= \mathcal {N}(\mu ,\varSigma ), $ where $\mu $ and $\varSigma $ denote a pre-defined mean vector and covariance matrix. In Sect. 4.1, we give an example of a reasonable choice for $\mu $ and $\varSigma $ for image data. Alternatively, for images with pixel representation (see Example 1) one can mask out the deleted pixels by blurred inputs, $v = K*x$, where K is a suitable blur kernel.

Table 1. Common obfuscation strategies with their perturbation formulas.

Full size table

We summarize common obfuscation strategies for a given target signal in Table 1.

3.1.2 Measure of Distortion

Various options exist for the measure $d :\mathbb {R}^m \times \mathbb {R}^m \rightarrow \mathbb {R}$ of the distortion between model outputs. The measure of distortion should be chosen according to the task of the model $\varPhi :\mathbb {R}^n\rightarrow \mathbb {R}^m$ and the objective of the explanation.

Example 7 (Measure of distortion for classification task)

Consider a classification model $\varPhi :\mathbb {R}^n\rightarrow \mathbb {R}^m$ and a target input signal $x \in \mathbb {R}^n$. The model $\varPhi $ assigns to each class $j\in \left\{ 1, \ldots , m\right\} $ a (pre-softmax) score $\varPhi _j(x)$ and the predicted label is given by $j^*:= \mathop {\mathop {\text {arg max}}\limits }\nolimits _{j \in \left\{ 1, \ldots , m\right\} } \varPhi _j(x)$. One commonly used measure of the distortion between the outputs at x and another data point $y\in \mathbb {R}^n$ is given as

$$\begin{aligned} d_1\big (\varPhi (x),\varPhi (y) \big ) := \big (\varPhi _{j^*}(x)- \varPhi _{j^*}(y) \big )^2. \end{aligned}$$

On the other hand, the vector $[\varPhi _j(x)]_{j =1}^m$ is usually normalized to a probability vector $[\tilde{\varPhi }_j(x)]_{j=1}^m$ by applying the softmax function, namely $\tilde{\varPhi }_j(x) := \exp {\varPhi _j(x)}/\sum _{i = 1}^m\exp {\varPhi _i(x)}$. This, in turn, gives another measure of the distortion between $\varPhi (x), \varPhi (y) \in \mathbb {R}^m$, namely

$$\begin{aligned} d_2\big (\varPhi (x),\varPhi (y) \big ) := \big (\tilde{\varPhi }_{j^*}(x)- \tilde{\varPhi }_{j^*}(y) \big )^2, \end{aligned}$$

where $j^* := \mathop {\mathop {\text {arg max}}\limits }\nolimits _{j \in \left\{ 1, \ldots , m\right\} } \varPhi _j(x) = \mathop {\mathop {\text {arg max}}\limits }\nolimits _{j \in \left\{ 1, \ldots , m\right\} } \tilde{\varPhi }_j(x)$. An important property of the softmax function is the invariance under translation by a vector $[c,\ldots ,c]^\top \in \mathbb {R}^m$, where $c\in \mathbb {R}$ is a constant. By definition, only $d_2$ respects this invariance while $d_1$ does not.

Example 8 (Measure of distortion for regression task)

Consider a regression model $\varPhi :\mathbb {R}^n\rightarrow \mathbb {R}^m$ and an input signal $x \in \mathbb {R}^n$. One can then define the measure of distortion between the outputs of x and another data point $y\in \mathbb {R}^n$ as

$$\begin{aligned} d_3\big ((\varPhi (x),\varPhi (y)\big ) := \left\| \varPhi (x)- \varPhi (y) \right\| _2^2. \end{aligned}$$

Sometimes it is reasonable to consider a certain subset of components $J \subseteq \left\{ 1,\ldots ,m\right\} $ of the output vectors instead of all m entries. Denoting the vector formed by corresponding entries by $\varPhi _J(x)$, the measure of distortion between the outputs can be defined as

$$\begin{aligned} d_4\big ((\varPhi (x),\varPhi (y)\big ) := \left\| \varPhi _J(x)- \varPhi _J(y) \right\| _2^2. \end{aligned}$$

The measure $d_4$ will be used in our experiments for radio maps in Subsect. 4.3.

3.2 Implementation

The RDE mask from Definition 2 was defined as a solution to

$$\begin{aligned} \min _{s\in \{0,1\}^k} \quad D(x,s,\mathcal {V}_s, \varPhi ) \quad \text { s.t. } \quad \left\| s \right\| _0 \le \ell . \end{aligned}$$

In practice, we need to relax this problem. We offer the following three approaches.

3.2.1 $\ell _1$-relaxation with Lagrange Multiplier

The RDE mask can be approximately computed by finding an approximate solution to the following relaxed minimization problem:

where $\lambda >0$ is a hyperparameter for the sparsity level. Note that the optimization problem is not necessarily convex, thus the solution might not be unique.

The expected distortion $D(x,s,\mathcal {V}_s, \varPhi )$ can typically be approximated with simple Monte-Carlo estimates, i.e., by averaging i.i.d. samples from $\mathcal {V}_s$. After estimating $D(x,s,\mathcal {V}_s, \varPhi )$, one can optimize the mask s with stochastic gradient descent (SGD) to solve the optimization problem ($\mathcal {P}_{1}$).

3.2.2 Bernoulli Relaxation

By viewing the binary mask as Bernoulli random variables $s\sim \text {Ber}(\theta )$ and optimizing over $\theta $, one can guarantee that the expected distortion $D(x,s,\mathcal {V}_s, \varPhi )$ is evaluated on binary masks $s\in \{0,1\}^n$. To encourage sparsity of the resulting mask, one can still apply $\ell _1$-regularization on s, giving rise to the following optimization problem:

Optimizing the parameter $\theta $ requires a continuous relaxation to apply SGD. This can be done using the concrete distribution [17], which samples s from a continuous relaxation of the Bernoulli distribution.

3.2.3 Matching Pursuit

As an alternative, one can also perform matching pursuit [18]. Here, the non-zero entries of $s\in \{0,1\}^n$ are determined sequentially in a greedy fashion to minimize the resulting distortion in each step. More precisely, we start with a zero mask $s^0=0$ and gradually build up the mask by updating $s^t$ at step t by the rule given by

$$\begin{aligned} s^{t+1} = s^t + \mathop {\text {arg min}}\limits _{e_j:\,s_j^t=0} \, D(x,s^{t}+e_j,\mathcal {V}_s,\varPhi ). \end{aligned}$$

Here, the minimization is taken over all standard basis vectors $e_j \in \mathbb {R}^k$ with $s_j^t = 0$. The algorithm terminates when reaching some desired error tolerance or after a prefixed number of iterations. While this means that in each iteration we have to test every entry of s, it is applicable when k is small or when we are only interested in very sparse masks.

4 Experiments

With our experiments, we demonstrate the broad applicability of the generalized RDE framework. Moreover, our experiments illustrate how different choices of obfuscation strategies, optimization procedures, measures of distortion, and input signal representations, discussed in Sect. 3.1, can be leveraged in practice. We explain model decisions on various challenging data modalities and tailor the input signal representation and measure of distortion to the domain and interpretation query. In Sect. 4.1, we focus on image classification, a common baseline task in the interpretability literature. In Sects. 4.2 and 4.3, we consider two other data modalities that are often unexplored. Section 4.2 focuses on audio data, where the underlying task is to classify acoustic instruments based on a short audio sample of distinct notes, while in Sect. 4.3, the underlying task is a regression with data in the form of physical simulations in urban environments. We also believe our explanation framework sustains applications beyond interpretability tasks. An example is given in Sect. 4.3.2, where we add an RDE inspired regularizer to the training objective of a radio map estimation model.

4.1 Images

We begin with the most ordinary domain in the interpretability literature: image classification tasks. The authors of [16] applied RDE to image data before by considering pixel-wise perturbations. We refer to this method as Pixel RDE. Other explanation methods [1,2,3, 20], have also previously exclusively operated in the pixel domain. In [11], we challenged this customary practice by successfully applying RDE in a wavelet basis, where sparsity translates into piece-wise smooth images (also called cartoon-like images). The novel explanation method was coined CartoonX [11] and extracts the relevant piece-wise smooth part of an image. First, we review the Pixel RDE method and present experiments on the ImageNet dataset [4], which is commonly considered a challenging classification task. Finally, we present CartoonX and discuss its advantages. For all the ImageNet experiments, we use the pre-trained MobileNetV3-Small [10], which achieved a top-1 accuracy of 67.668% and a top-5 accuracy of 87.402%, as the classifier.

4.1.1 Pixel RDE

Consider the following pixel-wise representation of an RGB image $x\in \mathbb {R}^{3\times n}$: $ f: \prod _{i=1}^n \mathbb {R}^3 \rightarrow \mathbb {R}^{n\times 3},\; x = f(h_1,...,h_n), $ where $h_i\in \mathbb {R}^3$ represents the three color channel values of the i-th pixel in the image x, i.e. $(x_{i,j})_{j=1,..,3}=h_{i}$. In pixel RDE a sparse mask $s\in [0,1]^n$ with n entries—one for each pixel—is optimized to achieve low expected distortion $D(x,s,\mathcal {V}_s, \varPhi )$. The obfuscation of an image x with the pixel mask s and a distribution $v\sim \mathcal {V}_s$ on $\prod _{i=1}^n \mathbb {R}^3$ is defined as $f(s \odot h + (1-s)\odot v)$. In our experiments, we initialize the mask with ones, i.e., $s_i = 1$ for every $i \in \left\{ 1,\ldots , n\right\} $, and consider Gaussian noise perturbations $\mathcal {V}_s = \mathcal {N}(\mu ,\varSigma )$. We set the noise mean $\mu \in \mathbb {R}^{3\times n}$ as the pixel value mean of the original image x and the covariance matrix $\varSigma :=\sigma ^2{\text {Id}}\in \mathbb {R}^{3n\times 3n}$ as a diagonal matrix with $\sigma >0$ defined as the pixel value standard deviation of the original image x. We then optimize the pixel mask s for 2000 gradient descent steps on the $\ell _1$-relaxation of the RDE objective (see Sect. 3.2.1). We computed the distortion $d\big (\varPhi (x),\varPhi (y) \big )$ in $D(x,s,\mathcal {V}_s, \varPhi )$ in the post-softmax activation of the predicted label multiplied by a constant $C=100$, i.e., $d\big (\varPhi (x),\varPhi (y) \big ) := C\big (\varPhi _{j^*}(x)- \varPhi _{j^*}(y) \big )^2$.

The expected distortion $D(x,s,\mathcal {V}_s, \varPhi )$ was approximated as a simple Monte-Carlo estimate after sampling 64 noise perturbations. For the sparsity level, we set the Lagrange multiplier to $\lambda =0.6$. All images were resized to 256 $\times $ 256 pixels. The mask was optimized for 2000 steps using the Adam optimizer with step size 0.003. In the middle row of Fig. 1, we show three example explanations with Pixel RDE for an image of a snail, a male duck, and an airplane, all from the ImageNet dataset. Pixel RDE highlights as relevant both the snail’s inner shell and part of its head, the lower segment of the male duck along with various lines in the water, and the airplane’s fuselage and part of its rudder.

4.1.2 CartoonX

Formally, we represent an RGB image $x\in [0,1]^{3\times n}$ in its wavelet coefficients $h = \{h_i\}_{i=1}^n \in \prod _{i=1}^n\mathbb {R}^3$ with $J \in \left\{ {1, \ldots , \lfloor \log _2 n \rfloor }\right\} $ scales as $ x = f(h) $, where f is the discrete inverse wavelet transform. Each $h_i = (h_{i,c})_{c=1}^3\subseteq \mathbb {R}^3$ contains three wavelet coefficients of the image, one for each color channel and is associated with a scale $k_i\in \left\{ 1, \ldots , J\right\} $ and a position in the image. Low scales describe high frequencies and high scales describe low frequencies at the respective image position. We briefly illustrate the wavelet coefficients in Fig. 2, which visualizes the discrete wavelet transform of an image. CartoonX [11] is a special case of the generalized RDE framework, particularly a special case of Example 2, and optimizes a sparse mask $s\in [0,1]^n$ on the wavelet coefficients (see Fig. 3c) so that the expected distortion $D(x,s,\mathcal {V}_s, \varPhi )$ remains small. The obfuscation of an image x with a wavelet mask s and a distribution $v\sim \mathcal {V}_s$ on the wavelet coefficients is $f(s \odot h + (1-s)\odot v)$. In our experiments, we used Gaussian noise perturbations and chose the standard deviation and mean adaptively for each scale: the standard deviation and mean for wavelet coefficients of scale $j\in \left\{ 1, \ldots , J\right\} $ were chosen as the standard deviation and mean of the wavelet coefficients of scale $j\in \left\{ 1, \ldots , J\right\} $ of the original image. Figure 3d shows the obfuscation $f(s \odot h + (1-s)\odot v)$ with the final wavelet mask s after the RDE optimization procedure. In Pixel RDE, the mask itself is the explanation as it lies in pixel space (see middle row in Fig. 1), whereas the CartoonX mask lies in the wavelet domain. To go back to the natural image domain, we multiply the wavelet mask element-wise with the wavelet coefficients of the original greyscale image and invert this product back to pixel space with the discrete inverse wavelet transform. The inversion is finally clipped into [0, 1] as are obfuscations during the RDE optimization to avoid overflow (we assume here the pixel values in x are normalized into [0, 1]). The clipped inversion in pixel space is the final CartoonX explanation (see Fig. 3e).

The following points should be kept in mind when interpreting the final CartoonX explanation, i.e., the inversion of the wavelet coefficient mask: (1) CartoonX provides the relevant pice-wise smooth part of the image. (2) The inversion of the wavelet coefficient mask was not optimized to be sparse in pixel space but in the wavelet basis. (3) A region that is black in the inversion could nevertheless be relevant if it was already black in the original image. This is due to the multiplication of the mask with the wavelet coefficients of the greyscale image before taking the discrete inverse wavelet transform. (4) Bright high resolution regions are relevant in high resolution and bright low resolution regions are relevant in low resolution. (5) It is inexpensive for CartoonX to mark large regions in low resolution as relevant. (6) It is expensive for CartoonX to mark large regions in high resolution as relevant.

In Fig. 1, we compare CartoonX to Pixel RDE. The piece-wise smooth wavelet explanations are more interpretable than the jittery Pixel RDEs. In particular, CartoonX asserts that the snail’s shell without the head suffices for the classification, unlike Pixel RDE, which insinuated that both the inner shell and part of the head are relevant. Moreover, CartoonX shows that the water gives the classifier context for the classification of the duck, which one could have only guessed from the Pixel RDE. Both Pixel RDE and CartoonX state that the head of the duck is not relevant. Lastly, CartoonX, like Pixel RDE, confirms that the wings play a subordinate role in the classification of the airplane.

4.1.3 Why Explain in the Wavelet Basis?

Wavelets provide optimal representation for piece-wise smooth 1D functions [5], and represent 2D piece-wise smooth images, also called cartoon-like images [12], efficiently as well [21]. Indeed, sparse vectors in the wavelet coefficient space encode cartoon-like images reasonably well [27], certainly better than sparse pixel representations. Moreover, the optimization process underlying CartoonX produces sparse vectors in the wavelet coefficient space. Hence CartoonX typically generates cartoon-like images as explanations. This is the fundamental difference to Pixel RDE, which produces rough, jittery, and pixel-sparse explanations. Cartoon-like images are more interpretable and provide a natural model of simplified images. Since the goal of the RDE explanation is to generate an easy to interpret simplified version of the input signal, we argue that CartoonX explanations are more appropriate for image classification than Pixel RDEs. Our experiments confirm that the CartoonX explanations are roughly piece-wise smooth explanations and are overall more interpretable than Pixel RDEs (see Fig. 1).

4.1.4 CartoonX Implementation

Throughout our CartoonX experiments we chose the Daubechies 3 wavelet system, $J=5$ levels of scales and zero padding for the discrete wavelet transform. For the implementation of the discrete wavelet transform, we used the Pytorch Wavelets package, which supports gradient computation in Pytorch. Distortion was computed as in the Pixel RDE experiments. The perturbations $v\sim \mathcal {V}_s$ on the wavelet coefficients were chosen as Gaussian noise with standard deviation and mean computed adaptively per scale. As in the Pixel RDE experiments, the wavelet mask was optimized for 2000 steps with the Adam optimizer to minimize the $\ell _1$-relaxation of the RDE objective. We used $\lambda =3$ for CartoonX.

4.1.5 Efficiency of CartoonX

Finally, we compare Pixel RDE to CartoonX quantitatively by analyzing the distortion and sparsity associated with the final explanation mask. Intuitively, we expect the CartoonX method to have an efficiency advantage, since the discrete wavelet transform already encodes natural images sparsely, and hence less wavelet coefficients are required to represent images than pixel coefficients. Our experiments confirmed this intuition, as can be seen in the scatter plot in Fig. 4.

4.2 Audio

We consider the NSynth dataset [6], a library of short audio samples of distinct notes played on a variety of instruments. We pre-process the data by computing the power-normalized magnitude spectrum and phase information using the discrete Fourier transform on a logarithmic scale from 20 to 8000 Hertz. Each data instance is then represented by the magnitude and the phase of its Fourier coefficients as well as the discrete inverse Fourier transform (see Example 3).

4.2.1 Explaining the Classifier

Our model $\varPhi $ is a network trained to classify acoustic instruments. We compute the distortion with respect to the pre-softmax scores, i.e., deploy $d_1$ in Example 7 as the measure of distortion. We follow the obfuscation strategy described in Example 5 and train an inpainter G to generate the obfuscation G(h, s, z). Here, h corresponds to the representation of a signal, s is a binary mask and z is a normally distributed seed to the generator.

We use a residual CNN architecture for G with added noise in the input and deep features. More details can be found in Sect. 4.2.3. We train G until the outputs are found to be satisfactory, exemplified by the outputs in Fig. 5.

To compute the explanation maps, we numerically solve ($\mathcal {P}_2$) as discussed in Subsect. 3.2. In particular, s is a binary mask indicating whether the phase and magnitude information of a certain frequency should be dropped and is specified as a Bernoulli variable $s \sim \text {Ber}(\theta )$. We chose a regularization parameter of $\lambda = 50$ and minimized the corresponding objective using the Adam optimizer with a step size of $10^{-5}$ in $10^6$ iterations. For the concrete distribution, we used a temperature of 0.1. Two examples resulting from this process can be seen in Fig. 6.

Notice here that the method actually shows a strong reliance of the classifier on low frequencies (30 Hz–60 Hz) to classify the top sample in Fig. 6 as a guitar, as only the guitar samples have this low frequency slope in the spectrum. We can also see in contrast that classifying the bass sample relies more on the continuous signal 100 Hz and 230 Hz.

4.2.2 Magnitude vs Phase

In the above experiment, we have represented the signals by the magnitude and phase information at each frequency, hence the mask s acts on each frequency. Now we consider the interpretation query of whether the entire magnitude spectrum or the entire phase spectrum is more relevant for the prediction. Accordingly, we consider the representation discussed in Example 4 and apply the mask s to turn off or on the whole magnitude spectrum or the phase information. Furthermore, we can optimize s not only for one datum but for all samples from a class. This extracts the information whether magnitude or phase is more important for predicting samples from a specific class.

For this, we again minimized ($\mathcal {P}_2$) (meaned over all samples of a class) with $\theta $ as the Bernoulli parameter using the Adam optimizer for $2 \times 10^5$ iterations with a step size of $10^{-4}$ and the regularization parameter $\lambda =30$. Again, a temperature of $t=0.1$ was used for the concrete distribution.

From the results of these computations, which can be seen in Table 2, we can observe that there is a clear difference on what the classifier bases its decision on across instruments. The classification of most instruments is largely based on phase information. For the mallet, the values are low for magnitude and phase, which means that the expected distortion is very low compared to the $\ell _1$-norm of the mask, even when the signal is completely inpainted. This underlines that the regularization parameter $\lambda $ may have to be adjusted for different data instances, especially when measuring distortion in the pre-softmax scores.

4.2.3 Architecture of the Inpainting Network G

Here, we briefly describe the architecture of the inpainting network G that was used to generate obfuscations to the target signals. In particular, Fig. 7 shows the diagram of the network G and Table 3 shows information about its layers.

Table 2. Magnitude importance versus phase importance.

Full size table

4.3 Radio Maps

In this subsection, we assume a set of transmitting devices (Tx) broadcasting a signal within a city. The received strength varies with location and depends on physical factors such as line of sight, reflection, and diffraction. We consider the regression problem of estimating a function that assigns the proper signal strength to each location in the city. Our dataset $\mathcal {D}$ is RadioMapSeer [14] containing 700 maps, 80 Tx per map, and a corresponding grayscale label encoding the signal strength at every location. Our model $\varPhi $ receives as input $x = [x^{(0)},x^{(1)},x^{(2)}]$, where $x^{(0)}$ is a binary map of the Tx locations, $x^{(1)}$ is a noisy binary map of the city (where a few buildings are missing), and $x^{(2)}$ is a grayscale image representing a number of ground truth measurements of the strength of the signal at the measured locations and zero elsewhere. We apply the UNet [13, 14, 22] architecture and train $\varPhi $ to output the estimation of the signal strength throughout the city that interpolates the input measurements.

Apart from the model $\varPhi $, we also have a simpler model $\varPhi _0$ , which only receives the city map and the Tx locations as inputs and is trained with unperturbed input city maps. This second model $\varPhi _0$ will be deployed to inpaint measurements to input to $\varPhi $. See Fig. 8a, 8b, and 8c for examples of a ground truth map and estimations for $\varPhi $ and $\varPhi _{0}$, respectively.

Table 3. Layer table of the Inpainting model for the NSynth task.

Full size table

4.3.1 Explaining Radio Map $\varPhi $

Observe that in Fig. 8a there is a missing building in the input (the black one) and in Fig. 8b, $\varPhi $ in-fills this building with a shadow. As a black box method, it is unclear why it made this decision. Did it rely on signal measurements or on building patterns? To address this, we consider each building as a cluster of pixels and each measurement as potential targets for our mask $s = [s^{(1)}, s^{(2)}]$, where $s^{(1)}$ acts on buildings and $s^{(2)}$ acts on measurements. We then apply matching pursuit (see Subsect. 3.2.3) to find a minimal mask s of critical components (buildings and measurements).

To be precise, suppose we are given a target input signal $x = [x^{(0)}, x^{(1)}, x^{(2)}]$. Let $k_1$ denote the number of buildings in $x^{(1)}$ and $k_2$ denote the number of measurements in $x^{(2)}$. Consider the function $f_1$ that takes as inputs vectors in $\left\{ 0,1\right\} ^{k_1}$, which indicate the existence of buildings in $x^{(1)}$, and maps them to the corresponding city map in the original city map format. Analogously, consider the function $f_2$ that takes as input the measurements in $\mathbb {R}^{k_2}$ and maps them to the corresponding grayscale image of the original measurements format. Then, $f_1$ and $f_2$ encode the locations of the buildings and measurements in the target signal $x=[x^{(0)}, f_1(h^{(1)}), f_2(h^{(2)})]$, where $h^{(1)}$ and $h^{(2)}$ denotes the building and measurement representation of x in $f_1$ and $f_2$. When $s^{(1)}$ has a zero entry, i.e., a building in $h^{(1)}$ was not selected, we replace the value in the obfuscation with zero (this corresponds to a constant perturbation equal to zero). Then, the obfuscation of the target signal x with a mask $s=[s^{(1)}, s^{(2)}]$ and perturbations $v=[v^{(1)}, v^{(2)}]:= [0, v^{(2)}] $ becomes:

$$\begin{aligned} y :=[x^{(0)}, f_1(s^{(1)}\odot h^{(1)}), f_2(s^{(2)}\odot h^{(2)}+ (1-s^{(2)})\odot v^{(2)})]. \end{aligned}$$

While it is natural to model masking out a building by simply zeroing out the corresponding cluster of pixels by choosing $v^{(1)}=0$, we need to also properly choose $v^{(2)}$ for the entries, where the mask $s^{(2)}$ takes value 0, in order to obtain appropriate obfuscations. For this, we can deploy the second model $\varPhi _0$ as an inpainter. We consider the following two extreme obfuscation strategies. The first is to set also $v^{(2)}$ to zero, i.e., simply remove the unchosen measurements from the input, with the underlying assumption being that any subset of measurements is valid for a city map. In the other extreme case, we inpaint all unchosen measurements by sampling at their locations the estimated radio map obtained by $\varPhi _0$ based on the buildings selected by $s^{(1)}$.

The two extreme measurement completion methods correspond to two extremes of the interpretation query. Filling-in the missing measurements by $\varPhi _0$ tends to overestimate the strength of the signal because there are fewer buildings to obstruct the transmissions. The empty mask will complete all measurements to the maximal possible signal strength – the free space radio map. The overestimation in signal strength is reduced when more measurements and buildings are chosen, resulting in darker estimated radio maps. Thus, this strategy is related to the query of which measurements and buildings are important to darken the free space radio map, turning it to the radio map produced by $\varPhi $. In the other extreme, adding more measurements to the mask with a fixed set of buildings typically brightens the resulting radio map. This allows us to answer which measurements are most important for brightening the radio map.

Between these two extreme strategies lies a continuum of completion methods where a random subset of the unchosen measurements is sampled from $\varPhi _0$, while the rest are set to zero. Examples of explanations of a prediction $\varPhi (x)$ according to these methods are presented in Fig. 9. Since we only care about specific small patches exemplified by the green boxes, the distortion here is measured with respect to the $\ell _2$ distance between the output images restricted to the corresponding region (see also Example 8).

When the query is how to darken the free space radio map (Fig. 9), the optimized mask s suggests that samples in the shadow of the missing building are the most influential in the prediction. These dark measurements are supposed to be in line-of-sight of a Tx, which indicates that the network deduced that there is a missing building. When the query is how to fill in the image both with shadows and bright spots (Fig. 9c), both samples in the shadow of the missing building and samples right before the building are influential. This indicates that the network used the bright measurements in line-of-sight and avoided predicting an overly large building. To understand the chosen buildings, note that $\varPhi $ is based on a composition of UNets and is thus interpreted as a procedure of extracting high level and global information from the inputs to synthesize the output. The locations of the chosen buildings in Fig. 9 reflect this global nature.

4.3.2 Interpretation-Driven Training

We now discuss an example application of the explanation obtained by the RDE approach described above, called interpretation driven training [23, 24, 28]. When a missing building is in line-of-sight of a Tx, we would like $\varPhi $ to reconstruct this building relying on samples in the shadow of the building rather than patterns in the city. To reduce the reliance of $\varPhi $ on the city information in this situation, one can add a regularization term in the training loss which promotes explanations relying on measurements. Suppose $x = [x^{(0)}, x^{(1)}, x^{(2)}]$ contains a missing input building in line-of-sight of the Tx location and denote the subset of pixels of the missing building in the city map as $J_x$. Denote the prediction by $\varPhi $ restricted to the subset $J_x$ as $\varPhi _{J_x}$. Moreover, define $\tilde{x} := [x^{(0)}, 0, x^{(2)}]$ to be the modification of x with all input buildings masked out. We then define the interpretation loss for x as

$$\begin{aligned} \ell _{\text {int}}(\varPhi , x) := \left\| \varPhi _{J_x}(x) - \varPhi _{J_x}(\tilde{x}) \right\| _2^2. \end{aligned}$$

The interpretation driven training objective then regularizes $\varPhi $ during training by adding the interpretation loss for all inputs x that contain a missing input building in line-of-sight of the Tx location. An example comparison between explanations of the vanilla RadioUNet $\varPhi $ and the interpretation driven network $\varPhi _{\text {int}}$ is given in Fig. 10.

5 Conclusion

In this chapter, we presented the Rate-Distortion Explanation (RDE) framework in a revised and comprehensive manner. Our framework is flexible enough to answer various interpretation queries by considering suitable data representations tailored to the underlying domain and query. We demonstrate the latter and the overall efficacy of the RDE framework on an image classification task, on an audio signal classification task, and on a radio map estimation task, a seldomly explored regression task.

References

Bach, S., Binder, A., Montavon, G., Klauschen, F., Müller, K.R., Samek, W.: On pixel-wise explanations for non-linear classifier decisions by layer-wise relevance propagation. PLoS ONE 10(7), e0130140 (2015)
Google Scholar
Chang, C., Creager, E., Goldenberg, A., Duvenaud, D.: Explaining image classifiers by counterfactual generation. In: Proceedings of the 7th International Conference on Learning Representations, ICLR (2019)
Google Scholar
Dabkowski, P., Gal, Y.: Real time image saliency for black box classifiers. In: Proceedings of the 31st International Conference on Neural Information Processing Systems, NeurIPS, pp. 6970–6979 (2017)
Google Scholar
Deng, J., Dong, W., Socher, R., Li, L.J., Li, K., Fei-Fei, L.: ImageNet: a large-scale hierarchical image database. In: Proceedings of the 2009 IEEE Conference on Computer Vision and Pattern Recognition, CVPR, pp. 248–255 (2009)
Google Scholar
DeVore, R.A.: Nonlinear approximation. Acta Numer. 7, 51–150 (1998)
Article Google Scholar
Engel, J., et al.: Neural audio synthesis of musical notes with wavenet autoencoders. In: Proceedings of the 34th International Conference on Machine Learning, ICML, vol. 70, pp. 1068–1077 (2017)
Google Scholar
Fong, R.C., Vedaldi, A.: Interpretable explanations of black boxes by meaningful perturbation. In: Proceedings of 2017 IEEE International Conference on Computer Vision (ICCV), pp. 3449–3457 (2017)
Google Scholar
Goodfellow, I.J., et al.: Generative adversarial nets. In: Proceedings of the 27th International Conference on Neural Information Processing Systems, NeurIPS, pp. 2672–2680 (2014)
Google Scholar
Heiß, C., Levie, R., Resnick, C., Kutyniok, G., Bruna, J.: In-distribution interpretability for challenging modalities. ICML, Interpret. Sci. Discov. (2020)
Google Scholar
Howard, A., et al.: Searching for MobileNetV3. In: Proceedings of the 2019 IEEE/CVF International Conference on Computer Vision (ICCV), pp. 1314–1324 (2019)
Google Scholar
Kolek, S., Nguyen, D.A., Levie, R., Bruna, J., Kutyniok, G.: Cartoon explanations of image classifiers. Preprint arXiv:2110.03485 (2021)
Kutyniok, G., Lim, W.-Q.: Compactly supported shearlets are optimally sparse. J. Approx. Theory 163(11), 1564–1589 (2011). https://doi.org/10.1016/j.jat.2011.06.005
Article MathSciNet MATH Google Scholar
Levie, R., Yapar, C., Kutyniok, G., Caire, G.: Pathloss prediction using deep learning with applications to cellular optimization and efficient D2D link scheduling. In: ICASSP 2020–2020 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 8678–8682 (2020). https://doi.org/10.1109/ICASSP40776.2020.9053347
Levie, R., Yapar, C., Kutyniok, G., Caire, G.: RadioUNet: fast radio map estimation with convolutional neural networks. IEEE Trans. Wirel. Commun. 20(6), 4001–4015 (2021)
Article Google Scholar
Lundberg, S.M., Lee, S.: A unified approach to interpreting model predictions. In: Proceedings of the 31st International Conference on Neural Information Processing Systems, NeurIPS, pp. 4768–4777 (2017)
Google Scholar
Macdonald, J., Wäldchen, S., Hauch, S., Kutyniok, G.: A rate-distortion framework for explaining neural network decisions. Preprint arXiv:1905.11092 (2019)
Maddison, C.J., Mnih, A., Teh, Y.W.: The concrete distribution: a continuous relaxation of discrete random variables. Preprint arXiv:1611.00712 (2016)
Mallat, S., Zhang, Z.: Matching pursuits with time-frequency dictionaries. IEEE Trans. Signal Process. 41(12), 3397–3415 (1993)
Article Google Scholar
Narasimha, M., Peterson, A.: On the computation of the discrete cosine transform. IEEE Trans. Commun. 26(6), 934–936 (1978)
Article Google Scholar
Ribeiro, M.T., Singh, S., Guestrin, C.: “Why should I trust you?": Explaining the predictions of any classifier. In: Proceedings of the 22nd International Conference on Knowledge Discovery and Data Mining, ACM SIGKDD, pp. 1135–1144. Association for Computing Machinery (2016)
Google Scholar
Romberg, J.K., Wakin, M.B., Baraniuk, R.G.: Wavelet-domain approximation and compression of piecewise smooth images. IEEE Trans. Image Process. 15, 1071–1087 (2006)
Article Google Scholar
Ronneberger, O., Fischer, P., Brox, T.: U-Net: convolutional networks for biomedical image segmentation. In: Navab, N., Hornegger, J., Wells, W.M., Frangi, A.F. (eds.) MICCAI 2015. LNCS, vol. 9351, pp. 234–241. Springer, Cham (2015). https://doi.org/10.1007/978-3-319-24574-4_28
Chapter Google Scholar
Ross, A.S., Hughes, M.C., Doshi-Velez, F.: Right for the right reasons: training differentiable models by constraining their explanations. In: Proceedings of the Twenty-Sixth International Joint Conference on Artificial Intelligence, IJCAI-17, pp. 2662–2670 (2017)
Google Scholar
Schramowski, P., et al.: Making deep neural networks right for the right scientific reasons by interacting with their explanations. Nat. Mach. Intell. 2, 476–486 (2020)
Article Google Scholar
Shrikumar, A., Greenside, P., Kundaje, A.: Learning important features through propagating activation differences. In: Proceedings of the 34th International Conference on Machine Learning, ICML, vol. 70, pp. 3145–3153 (2017)
Google Scholar
Smilkov, D., Thorat, N., Kim, B., Viégas, F., Wattenberg, M.: SmoothGrad: removing noise by adding noise. In: Workshop on Visualization for Deep Learning, ICML (2017)
Google Scholar
Stéphane, M.: Chapter 11.3. In: Stéphane, M. (ed.) A Wavelet Tour of Signal Processing, Third Edition, pp. 535–610. Academic Press, Boston (2009)
Google Scholar
Sun, J., Lapuschkin, S., Samek, W., Binder, A.: Explain and improve: LRP-inference fine-tuning for image captioning models. Inf. Fusion 77, 233–246 (2022)
Article Google Scholar
Sundararajan, M., Taly, A., Yan, Q.: Axiomatic attribution for deep networks. In: Proceedings of the 34th International Conference on Machine Learning, ICML, vol. 70, pp. 3319–3328 (2017)
Google Scholar
Teneggi, J., Luster, A., Sulam, J.: Fast hierarchical games for image explanations. Preprint arXiv:2104.06164 (2021)
Wäldchen, S., Macdonald, J., Hauch, S., Kutyniok, G.: The computational complexity of understanding network decisions. J. Artif. Intell. Res. 70 (2019)
Google Scholar
Yu, J., Lin, Z., Yang, J., Shen, X., Lu, X., Huang, T.S.: Generative image inpainting with contextual attention. In: Proceedings of the 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR, pp. 5505–5514 (2018)
Google Scholar

Download references

Acknowledgements

G.K. acknowledges partial support by the ONE Munich Strategy Forum (LMU Munich, TU Munich, and the Bavarian Ministery for Science and Art), the German Research Foundation under Grants DFG-SPP-2298, KU 1446/31-1 and KU 1446/32-1, and the BMBF under Grant MaGriDo. R.L. acknowledges support by the DFG SPP 1798, KU 1446/21-2 “Compressed Sensing in Information Processin” through Project Massive MIMO-II.

Author information

Authors and Affiliations

Department of Mathematics, LMU Munich, Munich, Germany
Stefan Kolek, Duc Anh Nguyen & Gitta Kutyniok
Courant Institute of Mathematical Sciences, NYU, New York, USA
Joan Bruna
Faculty of Mathematics, Technion - Israel Institute of Technology, Haifa, Israel
Ron Levie

Authors

Stefan Kolek
View author publications
You can also search for this author in PubMed Google Scholar
Duc Anh Nguyen
View author publications
You can also search for this author in PubMed Google Scholar
Ron Levie
View author publications
You can also search for this author in PubMed Google Scholar
Joan Bruna
View author publications
You can also search for this author in PubMed Google Scholar
Gitta Kutyniok
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Stefan Kolek .

Editor information

Editors and Affiliations

University of Natural Resources and Life Sciences Vienna, Vienna, Austria
Andreas Holzinger
University of Alberta, Edmonton, AB, Canada
Randy Goebel
Princeton University, Princeton, NJ, USA
Ruth Fong
Seoul National University, Seoul, Korea (Republic of)
Taesup Moon
Technische Universität Berlin, Berlin, Germany
Klaus-Robert Müller
Fraunhofer Heinrich Hertz Institute, Berlin, Germany
Wojciech Samek

Rights and permissions

Open Access This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Kolek, S., Nguyen, D.A., Levie, R., Bruna, J., Kutyniok, G. (2022). A Rate-Distortion Framework for Explaining Black-Box Model Decisions. In: Holzinger, A., Goebel, R., Fong, R., Moon, T., Müller, KR., Samek, W. (eds) xxAI - Beyond Explainable AI. xxAI 2020. Lecture Notes in Computer Science(), vol 13200. Springer, Cham. https://doi.org/10.1007/978-3-031-04083-2_6

Download citation

DOI: https://doi.org/10.1007/978-3-031-04083-2_6
Published: 17 April 2022
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-04082-5
Online ISBN: 978-3-031-04083-2
eBook Packages: Computer ScienceComputer Science (R0)

Publish with us

Policies and ethics

A Rate-Distortion Framework for Explaining Black-Box Model Decisions

Abstract

Similar content being viewed by others

Cartoon Explanations of Image Classifiers

Is My Neural Net Driven by the MDL Principle?

Model-Based Machine Learning and Approximate Inference

1 Introduction

2 Related Works

3 Rate-Distortion Explanation Framework

3.1 General Formulation

Example 1 (Pixel representation)

Example 2

Example 3

Example 4

Definition 1 (Obfuscations and expected distortion)

Definition 2 (The RDE mask)

3.1.1 Obfuscation Strategies and in-Distribution Interpretability

Example 5 (In-distribution obfuscation strategy)

Example 6 (Out-of-distribution obfuscation strategies)

3.1.2 Measure of Distortion

Example 7 (Measure of distortion for classification task)

Example 8 (Measure of distortion for regression task)

3.2 Implementation

3.2.1 \(\ell _1\)-relaxation with Lagrange Multiplier

3.2.2 Bernoulli Relaxation

3.2.3 Matching Pursuit

4 Experiments

4.1 Images

4.1.1 Pixel RDE

4.1.2 CartoonX

4.1.3 Why Explain in the Wavelet Basis?

4.1.4 CartoonX Implementation

4.1.5 Efficiency of CartoonX

4.2 Audio

4.2.1 Explaining the Classifier

4.2.2 Magnitude vs Phase

4.2.3 Architecture of the Inpainting Network G

4.3 Radio Maps

4.3.1 Explaining Radio Map \(\varPhi \)

4.3.2 Interpretation-Driven Training

5 Conclusion

References

Acknowledgements

Author information

Authors and Affiliations

Corresponding author

Editor information

Editors and Affiliations

Rights and permissions

Copyright information

About this chapter

Cite this chapter

Download citation

Share this chapter

Publish with us

Search

Navigation