Invertible Neural Networks for Understanding Semantics of Invariances of CNN Representations

Rombach, Robin; Esser, Patrick; Blattmann, Andreas; Ommer, Björn

doi:10.1007/978-3-031-01233-4_7

Robin Rombach⁴,
Patrick Esser⁴,
Andreas Blattmann⁴ &
…
Björn Ommer⁴

7548 Accesses
3 Citations

Abstract

To tackle increasingly complex tasks, it has become an essential ability of neural networks to learn abstract representations. These task-specific representations and, particularly, the invariances they capture turn neural networks into black-box models that lack interpretability. To open such a black box, it is, therefore, crucial to uncover the different semantic concepts a model has learned as well as those that it has learned to be invariant to. We present an approach based on invertible neural networks (INNs) that (i) recovers the task-specific, learned invariances by disentangling the remaining factor of variation in the data and that (ii) invertibly transforms these recovered invariances combined with the model representation into an equally expressive one with accessible semantic concepts. As a consequence, neural network representations become understandable by providing the means to (i) expose their semantic meaning, (ii) semantically modify a representation, and (iii) visualize individual learned semantic concepts and invariances. Our invertible approach significantly extends the abilities to understand black-box models by enabling post hoc interpretations of state-of-the-art networks without compromising their performance. Our implementation is available at https://compvis.github.io/invariances/.

You have full access to this open access chapter, Download chapter PDF

Making Sense of CNNs: Interpreting Deep Representations and Their Invariances with INNs

Visual interpretability for deep learning: a survey

Article 28 January 2018

Comparing the Interpretability of Deep Networks via Network Dissection

1 Introduction

Key to the wide success of deep neural networks is end-to-end learning of powerful hidden representations that aim to (i) capture all task-relevant characteristics while (ii) being invariant to all other variabilities in the data [LeC12, AS18]. Deep learning can yield abstract representations that are perfectly adapted feature encodings for the task at hand. However, their increasing abstraction capability and performance comes at the expense of a lack in interpretability [BBM+15]: although the network may solve a problem, it does not convey an understanding of its predictions or their causes, often leaving the impression of a black box [Mil19]. In particular, users are missing an explanation of semantic concepts that the model has learned to represent and of those it has learned to ignore, i.e., its invariances.

Providing such explanations and an understanding of network predictions and their causes is thus crucial for transparent AI. Not only is this relevant to discover limitations and promising directions for future improvements of the AI system itself, but also for compliance with legislation [GF17, Eur20], knowledge distillation from such a system [Lip18], and post hoc verification of the model [SWM17]. Consequently, research on interpretable deep models has recently gained a lot of attention, particularly methods that investigate latent representations to understand what the model has learned [SWM17, SZS+14, BZK+17, FV18, ERO20].

Challenges and aims: Assessing these latent representations is challenging due to two fundamental issues: (i) To achieve robustness and generalization despite noisy inputs and data variability, hidden layers exhibit a distributed coding of semantically meaningful concepts [FV18]. Attributing semantics to a single neuron via backpropagation [MLB+17] or synthesis [YCN+15] is thus impossible without altering the network [MSM18, ZKL+16], which typically degrades performance. (ii) End-to-end learning trains deep representations toward a goal task, making them invariant to features irrelevant for this goal. Understanding these characteristics that a representation has abstracted away is challenging, since we essentially need to portray features that have been discarded.

These challenges call for a method that can interpret existing network representations by recovering their invariances without modifying them. Given these recovered invariances, we seek an invertible mapping that translates a representation and the invariances onto understandable semantic concepts. The mapping disentangles the distributed encoding of the high-dimensional representation and its invariances by projecting them onto separate multi-dimensional factors that correspond to human-understandable semantic concepts. Both this translation and the recovering of invariances are implemented with invertible neural networks (INNs) [Red93, DSB17, KD18]. For the translation, this guarantees that the resulting understandable representation is equally expressive as the model representation combined with the recovered invariances (no information is lost). Its invertibility also warrants that feature modifications applied in the semantic domain correctly adjust the recovered representation.

Our contributions: Our contributions to a comprehensive understanding of deep representations are as follows: (i) We present an approach, which, by utilizing invertible neural networks, improves the understanding of representations produced by existing network architectures with no need for re-training or otherwise compromising their performance. (ii) Our generative approach is able to recover the invariances that result from the non-injective projection (of input onto a latent representation) which deep networks typically learn. This model then provides a probabilistic visualization of the latent representation and its invariances. (iii) We bijectively translate an arbitrarily abstract representation and its invariances via a non-linear transformation into another representation of equal expressiveness, but with accessible semantic concepts. (iv) The invertibility also enables manipulation of the original latent representations in a semantically understandable manner, thus facilitating further diagnostics of a network.

2 Background

Two main approaches to interpretable AI can be identified, those which aim to incorporate interpretability directly into the design of models, and those which aim to provide interpretability to existing models [MSM18]. Approaches from the first category range from modifications of network architectures [ZKL+16], over regularization of models encouraging interpretability [LBMO19, PASC+20], toward combinations of both [ZNWZ18]. However, these approaches always involve a trade-off between model performance and model interpretability. Being of the latter category, our approach allows to interpret representations of existing models without compromising their performance.

To better understand what an existing model has learned, its representations must be studied [SWM17]. Syegedy et al. [SZS+14] show that both random directions and coordinate axes in the feature space of networks can represent semantic properties and conclude that they are not necessarily represented by individual neurons. Different works attempt to select groups of neurons which have a certain semantic meaning, such as based on scenes [ZKL+15], objects [SR15] and object parts [SRD14]. [BZK+17] studied the interpretability of neurons and found that a rotation of the representation space spanned by the neurons decreases its interpretability. While this suggests that the neurons provide a more interpretable basis compared to a random basis, [FV18] shows that the choice of basis is not the only challenge for interpretability of representations. Their findings demonstrate that learned representations are distributed, i.e., a single semantic concept is encoded by an activation pattern involving multiple neurons, and a single neuron is involved in the encoding of multiple different semantic concepts. Instead of selecting a set of neurons directly, [ERO20] learns an INN that transforms the original representation space to an interpretable space, where a single semantic concept is represented by a known group of neurons and a single neuron is involved in the encoding of just a single semantic concept. However, to interpret not only the representation itself but also its invariances, it is insufficient to transform only the representation itself. Our approach therefore transforms the latent representation space of an autoencoder, which has the capacity to represent its inputs faithfully, and subsequently translates a model representation and its invariances into this space for semantic interpretation and visualization.

A large body of works approach interpretability of existing networks based on visualizations. Selvaraju et al. [SCD+20] use gradients of network outputs with respect to a convolutional layer to obtain coarse localization maps. Bach et al. [BBM+15] propose an approach to obtain pixel-wise relevance scores for a specific class of models which is generalized in [MLB+17]. To obtain richer visual interpretations, [ZF14, SVZ14, YCN+15, MV16] reconstruct images which maximally activate certain neurons. Nguyen et al. [NDY+16] use a generator network for this task, which was introduced in [DB16] for reconstructing images from their feature representation. Our key insight is that these existing approaches do not explicitly account for the invariances learned by a model. Invariances imply that feature inversion is a one-to-many mapping and thus they must be recovered to solve the task. Recently, [SGM+20] introduced a GAN-based approach that utilizes features of a pretrained classifier as a semantic pyramid for image generation. Nash et al. [NKW19] used samples from an autoregressive model of images conditioned on a feature representation to gain insights into the representation’s invariances. In contrast, our approach recovers an explicit representation of the invariances, which can be recombined with modified feature representations, and thus makes the effect of modifications to representations, e.g., through adversarial attacks, visible.

Other works consider visual interpretations for specialized models. Santurkar et al. [SIT+19] showed that the quality of images which maximally activate certain neurons is significantly improved when activating neurons of an adversarially robust classifier. Bau et al. [BZS+19] explore the relationship between neurons and the images produced by a generative adversarial network. For the same class of models, [GAOI19] finds directions in their input space which represent semantic concepts corresponding to certain cognitive properties. Such semantic directions have previously also been found in classifier networks [UGP+17] but requires aligned data. All of these approaches require either special training of models, are limited to a very special class of models which already provide visualizations, or depend on special assumptions on model and data. In contrast, our approach can be applied to arbitrary models without re-training or modifying them, and provides both visualizations and semantic explanations, for both the model’s representation and its learned invariances.

3 Method

Common tasks of computer vision can be phrased as a mapping from an input image $\boldsymbol{x}$ to some output $\boldsymbol{f}(\boldsymbol{x})$ such as a classification of the image, a regression (e.g., of object locations), a (semantic) segmentation map, or a re-synthesis that yields another image. Deep learning utilizes a hierarchy of intermediate network layers that gradually transform the input into increasingly more abstract representations. Let $\boldsymbol{z}=\boldsymbol{\Phi }(\boldsymbol{x}) \in \mathbb {R}^{N_{\boldsymbol{z}}}$ be the representation extracted by one such layer (without loss of generality we consider $\boldsymbol{z}$ to be an $N_{\boldsymbol{z}}$-dim vector, flattening it if necessary) and $\boldsymbol{f}(\boldsymbol{x})=\boldsymbol{\Psi }(\boldsymbol{z})=\boldsymbol{\Psi }(\boldsymbol{\Phi }(\boldsymbol{x}))$ the mapping onto the output.

An essential characteristic of a deep feature encoding $\boldsymbol{z}$ is the increasing abstractness of higher feature encoding layers and the resulting reduction of information. This reduction generally causes the feature encoding to become invariant to those properties of the input image, which do not provide salient information for the task at hand [CWG+18]. To explain a latent representation, we need to recover such invariances $\boldsymbol{v}$ and make $\boldsymbol{z}$ and $\boldsymbol{v}$ interpretable by learning a bijective mapping onto understandable semantic concepts, see Fig. 1. Section 3.1 describes our INN $\boldsymbol{t}$ to recover an encoding $\boldsymbol{v}$ of the invariances. Due to the generative nature of $\boldsymbol{t}$, our approach can correctly sample visualizations of the model representation and its invariances without leaving the underlying data distribution and introducing artifacts. With $\boldsymbol{v}$ then available, Sect. 3.2 presents an INN $\boldsymbol{e}$ that translates $\boldsymbol{t}$’s encoding of $\boldsymbol{z}$ and $\boldsymbol{v}$ without losing information onto disentangled semantic concepts. Moreover, the invertibility allows modifications in the semantic domain to correctly project back onto the original representation or into image space.

3.1 Recovering the Invariances of Deep Models

Learning an encoding to help recover invariances: Key to a deep representation is not only the information $\boldsymbol{z}$ captures, but also what is has learned to abstract away. To learn what $\boldsymbol{z}$ misses with respect to $\boldsymbol{x}$, we need an encoding $\boldsymbol{\hat{z}}$, which, in contrast to $\boldsymbol{z}$, includes the invariances exhibited by $\boldsymbol{z}$. Without making prior assumptions about the deep model $\boldsymbol{f}$, autoencoders provide a generic way to obtain such an encoding $\boldsymbol{\hat{z}}$, since they ensure that their input $\boldsymbol{x}$ can be recovered from their learned representation $\boldsymbol{\hat{z}}$, which hence also comprises the invariances.

Therefore, we learn an autoencoder with an encoder $\boldsymbol{E}$ that provides the data representation $\boldsymbol{\hat{z}}= \boldsymbol{E}(\boldsymbol{x})$ and a decoder $\boldsymbol{D}$ producing the data reconstruction $\boldsymbol{\hat{x}}= \boldsymbol{D}(\boldsymbol{\hat{z}})$. Section 3.2 will utilize the decoding from $\boldsymbol{\hat{z}}$ to $\boldsymbol{\hat{x}}$ to visualize both $\boldsymbol{z}$ and $\boldsymbol{v}$. The autoencoder is trained to reconstruct its inputs by minimizing a perceptual metric between input and reconstruction, $\Vert \boldsymbol{x}- \boldsymbol{\hat{x}}\Vert $, as in [DB16]. The details of the architecture and training procedure can be found in Sect. 3.3, autoencoder E, D. It is crucial that the autoencoder only needs to be trained once on the training data. Consequently, the same $\boldsymbol{E}$ can be used to interpret different representations $\boldsymbol{z}$, e.g., different models or layers within a model, thus ensuring fair comparisons between them. Moreover, the complexity of the autoencoder can be adjusted based on the computational needs, allowing us to work with much lower dimensional encodings $\boldsymbol{\hat{z}}$ compared to reconstructing the invariances directly from the images $\boldsymbol{x}$. This reduces the computational demands of our approach significantly.

Learning a conditional INN that recovers invariances: Due to the reconstruction task of the autoencoder, $\boldsymbol{\hat{z}}$ not only contains the invariances $\boldsymbol{v}$, but also the representation $\boldsymbol{z}$. Thus, we must disentangle [EHO19, LSOL20, KSLO19] $\boldsymbol{v}$ and $\boldsymbol{z}$ using a mapping $\boldsymbol{t}(\cdot \vert \boldsymbol{z}): \boldsymbol{\hat{z}}\mapsto \boldsymbol{v}= \boldsymbol{t}(\boldsymbol{\hat{z}}\vert \boldsymbol{z})$ which, depending on $\boldsymbol{z}$, extracts $\boldsymbol{v}$ from $\boldsymbol{\hat{z}}$.

Besides extracting the invariances from a given $\boldsymbol{\hat{z}}$, $\boldsymbol{t}$ must also enable an inverse mapping from given model representations $\boldsymbol{z}$ to $\boldsymbol{\hat{z}}$ to support a further mapping onto semantic concepts Sect. 3.2 and visualization based on $\boldsymbol{D}(\boldsymbol{\hat{z}})$. There are many different $\boldsymbol{x}$ with $\boldsymbol{\Phi }(\boldsymbol{x}) = \boldsymbol{z}$, namely, all those $\boldsymbol{x}$ which differ only in properties that $\boldsymbol{\Phi }$ is invariant to. Thus, there are also many different $\boldsymbol{\hat{z}}$ that this mapping must recover. Consequently, the mapping from $\boldsymbol{z}$ to $\boldsymbol{\hat{z}}$ is set-valued. However, to understand $\boldsymbol{f}$ we do not want to recover all possible $\boldsymbol{\hat{z}}$, but only those which are likely under the training distribution of the autoencoder. In particular, this excludes unnatural images such as those obtained by DeepDream [MOT15] or adversarial attacks [SZS+14]. In conclusion, we need to sample $\boldsymbol{\hat{z}}\sim p(\boldsymbol{\hat{z}}\vert \boldsymbol{z})$.

To avoid a costly inversion process of $\boldsymbol{\Phi }$, $\boldsymbol{t}$ must be invertible (implemented as an INN) so that a change of variables

$$\begin{aligned} p(\boldsymbol{\hat{z}}\vert \boldsymbol{z}) = \frac{p(\boldsymbol{v}\vert \boldsymbol{z})}{\vert \det \nabla (\boldsymbol{t}^{-1})(\boldsymbol{v}\vert \boldsymbol{z}) \vert } \quad \text {, where } \boldsymbol{v}= \boldsymbol{t}(\boldsymbol{\hat{z}}\vert \boldsymbol{z}), \end{aligned}$$

(1)

yields $p(\boldsymbol{\hat{z}}\vert \boldsymbol{z})$ by means of the distribution $p(\boldsymbol{v}\vert \boldsymbol{z})$ of invariances, given a model representation $\boldsymbol{z}$. Here, the denominator denotes the absolute value of the determinant of Jacobian $\nabla (\boldsymbol{t}^{-1})$ of $\boldsymbol{v}\mapsto \boldsymbol{t}^{-1}(\boldsymbol{v}\vert \boldsymbol{z})=\boldsymbol{\hat{z}}$, which is efficient to compute for common invertible network architectures. Consequently, we obtain $\boldsymbol{\hat{z}}$ for given $\boldsymbol{z}$ by sampling from the invariant space $\boldsymbol{v}$ given $\boldsymbol{z}$ and then applying $\boldsymbol{t}^{-1}$,

$$\begin{aligned} \boldsymbol{\hat{z}}\sim p(\boldsymbol{\hat{z}}\vert \boldsymbol{z}) \quad \iff \quad \boldsymbol{v}\sim p(\boldsymbol{v}\vert \boldsymbol{z}) , \boldsymbol{\hat{z}}= \boldsymbol{t}^{-1}(\boldsymbol{v}\vert \boldsymbol{z}) . \end{aligned}$$

(2)

Since $\boldsymbol{v}$ is the invariant space for $\boldsymbol{z}$, both are complementary thus implying independence $p(\boldsymbol{v}\vert \boldsymbol{z}) = p(\boldsymbol{v})$. Because a powerful transformation $\boldsymbol{t}^{-1}$ can transform between two arbitrary densities, we can assume without loss of generality a Gaussian prior $p(\boldsymbol{v}) = \mathcal {N}(\boldsymbol{v}\vert \boldsymbol{0}, \boldsymbol{I})$, where $\boldsymbol{I}$ is the identity matrix. Given this prior, our task is then to learn the transformation $\boldsymbol{t}$ that maps $\mathcal {N}(\boldsymbol{v}\vert \boldsymbol{0}, \boldsymbol{I})$ onto $p(\boldsymbol{\hat{z}}\vert \boldsymbol{z})$. To this end, we maximize the log-likelihood of $\boldsymbol{\hat{z}}$ given $\boldsymbol{z}$, which results in a per-example loss of

$$\begin{aligned} J_e(\boldsymbol{\hat{z}}, \boldsymbol{z}) = -\log p(\boldsymbol{\hat{z}}\vert \boldsymbol{z}) = -\log \mathcal {N}(\boldsymbol{t}(\boldsymbol{\hat{z}}\vert \boldsymbol{z}) \vert \boldsymbol{0}, \boldsymbol{I}) - \log \vert \det \nabla \boldsymbol{t}(\boldsymbol{\hat{z}}\vert \boldsymbol{z}) \vert . \end{aligned}$$

(3)

Minimizing this loss over the training data distribution $p(\boldsymbol{x})$ gives $\boldsymbol{t}$, a bijective mapping between $\boldsymbol{\hat{z}}$ and ($\boldsymbol{z}, \boldsymbol{v}$),

$$\begin{aligned} J(\boldsymbol{t})&= \mathbb {E}_{\boldsymbol{x}\sim p(\boldsymbol{x})} \left[ J_e(\boldsymbol{E}(\boldsymbol{x}), \boldsymbol{\Phi }(\boldsymbol{x})) \right] \end{aligned}$$

(4)

$$\begin{aligned}&= \mathbb {E}_{\boldsymbol{x}\sim p(\boldsymbol{x})} \left[ \frac{1}{2}\Vert \boldsymbol{t}(\boldsymbol{E}(\boldsymbol{x}) \vert \boldsymbol{\Phi }(\boldsymbol{x}))\Vert ^2 + N_{\boldsymbol{\hat{z}}}\log 2\pi - \log \vert \det \nabla \boldsymbol{t}(\boldsymbol{E}(\boldsymbol{x}) \vert \boldsymbol{\Phi }(\boldsymbol{x})) \vert \right] . \end{aligned}$$

(5)

Note that both $\boldsymbol{E}$ and $\boldsymbol{\Phi }$ remain fixed during minimization of $J$.

3.2 Interpreting Representations and Their Invariances

Visualizing representations and invariances: For an image representation $\boldsymbol{z}= \boldsymbol{\Phi }(\boldsymbol{x})$, (2) presents an efficient approach (a single forward pass through the INN $\boldsymbol{t}$) to sample an encoding $\boldsymbol{\hat{z}}$, which is a combination of $\boldsymbol{z}$ with a particular realization of its invariances $\boldsymbol{v}$. Sampling multiple realizations of $\boldsymbol{\hat{z}}$ for a given $\boldsymbol{z}$ highlight what remains constant and what changes due to different $\boldsymbol{v}$: information preserved in the representation $\boldsymbol{z}$ remains constant over different samples and information discarded by the model ends up in the invariances $\boldsymbol{v}$ and shows changes over different samples. Visualizing the samples $\boldsymbol{\hat{z}}\sim p(\boldsymbol{\hat{z}}\vert \boldsymbol{z})$ with $\boldsymbol{\hat{x}}= \boldsymbol{D}(\boldsymbol{\hat{z}})$ portrays this constancy and changes due to different $\boldsymbol{v}$. To complement this visualization, in the following, we learn a transformation of $\boldsymbol{\hat{z}}$ into a semantically meaningful representation which allows to uncover the semantics captured by $\boldsymbol{z}$ and $\boldsymbol{v}$.

Learning an INN to produce semantic interpretations: The autoencoder representation $\boldsymbol{\hat{z}}$ is an equivalent representation of $(\boldsymbol{z}, \boldsymbol{v})$ but its feature dimensions do not necessarily correspond to semantic concepts [FV18]. More generally, without supervision, we cannot reliably discover semantically meaningful, explanatory factors of $\boldsymbol{\hat{z}}$ [LBL+19]. In order to explain $\boldsymbol{\hat{z}}$ in terms of given semantic concepts, we apply the approach of [ERO20] and learn a bijective transformation of $\boldsymbol{\hat{z}}$ to an interpretable representation $\boldsymbol{e}(\boldsymbol{\hat{z}})$ where different groups of components, called factors, correspond to semantic concepts.

To learn the transformation $\boldsymbol{e}$, we parameterize $\boldsymbol{e}$ by an INN and assume that semantic concepts are defined implicitly by pairs of images, i.e., for each semantic concept we have access to training pairs $\boldsymbol{x}^\mathrm {a}, \boldsymbol{x}^\mathrm {b}$ that have the respective concept in common. For example, the semantic concept “smiling” is defined by pairs of images, where either both images show smiling persons or both images show non-smiling persons. Applying this formulation, input pairs which are similar in a certain semantic concept are similar in the corresponding factor of the interpretable representation $\boldsymbol{e}(\boldsymbol{\hat{z}})$.

Following [ERO20], the loss for training the invertible network $\boldsymbol{e}$ is then given by

$$\begin{aligned} J(\boldsymbol{e}) = \mathbb {E}_{\boldsymbol{x}^\mathrm {a}, \boldsymbol{x}^\mathrm {b}}&\left[ -\log p(\boldsymbol{e}(\boldsymbol{E}(\boldsymbol{x}^\mathrm {a})), \boldsymbol{e}(\boldsymbol{E}(\boldsymbol{x}^\mathrm {b}))) \right. \nonumber \\&\left. -\log \vert \det \nabla \boldsymbol{e}(\boldsymbol{E}(\boldsymbol{x}^\mathrm {a})) \vert -\log \vert \det \nabla \boldsymbol{e}(\boldsymbol{E}(\boldsymbol{x}^\mathrm {b})) \vert \right] . \end{aligned}$$

(6)

Interpretation by applying the learned INNs: After training, the combination of $\boldsymbol{e}$ with $\boldsymbol{t}$ from Sect. 3.1 provides semantic interpretations given a model representation $\boldsymbol{z}$: (2) gives realizations of the invariances $\boldsymbol{v}$ which are combined with $\boldsymbol{z}$ to produce $\boldsymbol{\hat{z}}= \boldsymbol{t}^{-1}(\boldsymbol{v}\vert \boldsymbol{z})$. Then $\boldsymbol{e}$ transforms $\boldsymbol{\hat{z}}$ without loss of information into a semantically accessible representation $(\boldsymbol{e}_i)_i = \boldsymbol{e}(\boldsymbol{\hat{z}}) = \boldsymbol{e}(\boldsymbol{t}^{-1}(\boldsymbol{v}\vert \boldsymbol{z}))$ consisting of different semantic factors $\boldsymbol{e}_i$. Comparing the $\boldsymbol{e}_i$ for different model representations $\boldsymbol{z}$ and invariances $\boldsymbol{v}$ allows us to observe which semantic concepts the model representation $\boldsymbol{z}=\boldsymbol{\Phi }(\cdot )$ is sensitive to, and which it is invariant to.

Semantic Modifications of Latent Representations: The transformations $\boldsymbol{t}^{-1}$ and $\boldsymbol{e}$ not only interpret a representation $\boldsymbol{z}$ in terms of accessible semantic concepts $(\boldsymbol{e}_i)_i$. Given $\boldsymbol{v}\sim p(\boldsymbol{v})$, they also allow to modify $\boldsymbol{\hat{z}}=\boldsymbol{t}^{-1}(\boldsymbol{v}\vert \boldsymbol{z})$ in a semantically meaningful manner by altering its corresponding $(\boldsymbol{e}_i)_i$ and then applying the inverse translation $\boldsymbol{e}^{-1}$,

$$\begin{aligned} \boldsymbol{\hat{z}}\xrightarrow {\boldsymbol{e}} (\boldsymbol{e}_i) \xrightarrow {\text {modification}} (\boldsymbol{e}_i^{*}) \xrightarrow {\boldsymbol{e}^{-1}} \boldsymbol{\hat{z}}^{*}. \end{aligned}$$

(7)

The modified representation $\boldsymbol{\hat{z}}^{*}$ is then readily transformed back into image space $\boldsymbol{\hat{x}}^{*} = \boldsymbol{D}(\boldsymbol{\hat{z}}^{*})$. Besides visual interpretation of the modification, $\boldsymbol{\hat{x}}^{*}$ can be fed into the model $\boldsymbol{\Psi }(\boldsymbol{\Phi }(\boldsymbol{\hat{x}}^{*}))$ to probe for sensitivity to certain semantic concepts.

3.3 Implementation Details

In this section, we provide implementation details about the exact training procedure and architecture of all components of our approach. This is only for the sake of clarity and completeness. For readers who are already familiar with INNs or people rather interested in the higher level ideas of our approach than in its technical details, this section can be safely skipped.

Autoencoder$\boldsymbol{E}, \boldsymbol{D}$: In Sect. 3.1, we introduced an autoencoder to obtain a representation $\boldsymbol{\hat{z}}$ of $\boldsymbol{x}$, which includes the invariances abstracted away by a given model representation $\boldsymbol{z}$. This autoencoder consists of an encoder $E(\boldsymbol{x})$ and a decoder $D(\boldsymbol{\hat{z}})$.

Because the INNs $\boldsymbol{t}$ and $\boldsymbol{e}$ transform the distribution of $\boldsymbol{\hat{z}}$, we must ensure a strictly positive density for $\boldsymbol{\hat{z}}$ to avoid degenerate solutions. This is readily achieved with a stochastic encoder, i.e., we predict mean $\boldsymbol{E}(\boldsymbol{x})_{\boldsymbol{\mu }}$ and diagonal $\boldsymbol{E}(\boldsymbol{x})_{\boldsymbol{\sigma }^2}$ of a Gaussian distribution, and obtain the desired representation as $\boldsymbol{\hat{z}}\sim \mathcal {N}(\boldsymbol{\hat{z}}\vert \boldsymbol{E}(\boldsymbol{x})_{\boldsymbol{\mu }}, {\text {diag}}(\boldsymbol{E}(\boldsymbol{x})_{\boldsymbol{\sigma }^2}))$. Following [DW19], we train this autoencoder as a variational autoencoder using the reparameterization trick [KW14, RMW14] to match the encoded distribution to a standard normal distribution, and jointly learn the scalar output variance $\gamma $ under an image metric $\Vert \boldsymbol{x}- \boldsymbol{\hat{x}}\Vert $ to avoid blurry reconstructions. The resulting loss function is thus

(8)

Note that $\sqrt{(\cdot )}$ and $\log (\cdot )$ on multi-dimensional entities are applied element-wise. In the experiments shown in this chapter, we use images of spatial resolutions $28 \times 28$ and $128 \times 128$, resulting in different architectures for the autoencoder, summarized in Tables 1 and 2, respectively. For the encoder $\boldsymbol{E}$ processing images of spatial resolution $128 \times 128$, we use an architecture based on ResNet-101 [HZRS16], and for the corresponding decoder $\boldsymbol{D}$ we use an architecture based on BigGAN [BDS19], where we include a small fully connected network to replace the class conditioning used in BigGAN by a conditioning on $\boldsymbol{\hat{z}}$.

Table 1 Autoencoder architecture on datasets with images of resolution $28 \times 28$

Full size table

Table 2 Autoencoder architecture for datasets with images of resolution $128 \times 128$

Full size table

In the $28\times 28$ case, we use a squared $L_2$ loss for the image metric, which corresponds to the first term in (8). For our $128 \times 128$-models, we further use an improved metric as in [DB16], which includes additional perceptual [ZIE+18] and discriminator losses. The perceptual loss consists of $L_1$ feature distances obtained from different layers of a fixed, pretrained network. We use a VGG-16 network pretrained on ImageNet and weighted distances of different layers as in [ZIE+18]. The discriminator is trained along with the autoencoder to distinguish reconstructed images from real images using a binary classification loss, and the autoencoder maximizes the log-probability that reconstructed images are classified as real images. The architectures of VGG-16 and the discriminator are summarized in Table 3.

Table 3 Architectures used to compute image metrics for the autoencoder, which were used for training the autoencoder $\boldsymbol{E}, \, \boldsymbol{D}$ on datasets with images of resolution $128 \times 128$

Full size table

Details on the INN$\boldsymbol{e}$ for Revealing Semantics of Deep Representations: Previous works have successfully applied INNs for density estimation [DSB17], inverse problems [AKW+19], and on top of autoencoder representations [ERO20, XYA19, DMB+21, BMDO21b, BMDO21a] for a wide range of applications such as video synthesis [DMB+21, BMDO21b] and translation between pretrained, powerful networks [REO20]. This section provides details on how we embed the approach of [ERO20] to reveal the semantic concepts of autoencoder representations $\boldsymbol{\hat{z}}$, cf. Sect. 3.2.

Since we will never have examples for all relevant semantic concepts, we include a residual concept that captures the remaining variability of $\boldsymbol{\hat{z}}$, which is not explained by the given semantic concepts.

Following [ERO20], we learn a bijective transformation $\boldsymbol{e}(\boldsymbol{\hat{z}})$, which translates the non-interpretable representation $\boldsymbol{\hat{z}}$ invertibly into a factorized representation $(\boldsymbol{e}_i(\boldsymbol{\hat{z}}))_{i=0}^K=\boldsymbol{e}(\boldsymbol{\hat{z}})$, where each factor $\boldsymbol{e}_i \in \mathbb {R}^{N_{\boldsymbol{e}_{i}}}$ represents one of the given semantic concepts for $i = 1,\dots ,K$, and $\boldsymbol{e}_0 \in \mathbb {R}^{N_{\boldsymbol{e}_{0}}}$ is the residual concept.

The INN $\boldsymbol{e}$ establishes a one-to-one correspondence between an encoding and different semantic concepts and, conversely, enables semantic modifications to correctly alter the original encoding (see next section). Being an INN, $\boldsymbol{e}(\boldsymbol{\hat{z}})$ and $\boldsymbol{\hat{z}}$ need to have the same dimensionality and we set $N_{\boldsymbol{e}_{0}} = N_{\boldsymbol{\hat{z}}}- \sum _{i=1}^KN_{\boldsymbol{e}_{i}}$. We denote the indices of concept i with respect to $\boldsymbol{e}(\boldsymbol{\hat{z}})$ as $\mathcal {I}_i \subset \{1, \dots , N_{\boldsymbol{\hat{z}}}\}$ such that we can write $\boldsymbol{e}_i = (\boldsymbol{e}(\boldsymbol{\hat{z}})_k)_{k\in \mathcal {I}_i}$.

In the following, we emphasize on deriving a loss function for training the semantic INN. Let $\boldsymbol{e}_i$ be the factor representing some semantic concept, e.g., gender, that the contents of two images $\boldsymbol{x}^\mathrm {a}, \boldsymbol{x}^\mathrm {b}$ share. Then the projection of their encodings $\boldsymbol{\hat{z}}^\mathrm {a}, \boldsymbol{\hat{z}}^\mathrm {b}$ onto this semantic concept must be similar [ERO20, KWKT15],

$$\begin{aligned} \boldsymbol{e}_i(\boldsymbol{\hat{z}}^\mathrm {a}) \simeq \boldsymbol{e}_i(\boldsymbol{\hat{z}}^\mathrm {b}) \quad \text {where } \boldsymbol{\hat{z}}^\mathrm {a} = \boldsymbol{E}(\boldsymbol{x}^\mathrm {a}), \boldsymbol{\hat{z}}^\mathrm {b} = \boldsymbol{E}(\boldsymbol{x}^\mathrm {b}) . \end{aligned}$$

(9)

Moreover, to interpret $\boldsymbol{\hat{z}}$ we are interested in the separate contribution of different semantic concepts $\boldsymbol{e}_i$ that explain $\boldsymbol{\hat{z}}$. Hence, we seek a mapping $\boldsymbol{e}(\cdot )$ that strives to disentangle different concepts,

$$\begin{aligned} \boldsymbol{e}_i(\boldsymbol{\hat{z}}) \perp \boldsymbol{e}_j(\boldsymbol{\hat{z}}) \quad \forall i \ne j, \boldsymbol{x}\quad \text {where } \boldsymbol{\hat{z}}= E(\boldsymbol{x}) . \end{aligned}$$

(10)

The objectives in (9), (10) imply a correlation in $\boldsymbol{e}_i$ for pairs $\boldsymbol{\hat{z}}^\mathrm {a}$ and $\boldsymbol{\hat{z}}^\mathrm {b}$ and no correlation between concepts $\boldsymbol{e}_i, \boldsymbol{e}_j$ for $i \ne j$. This calls for a Gaussian distribution with a covariance matrix that reflects these requirements.

Let $\boldsymbol{e}^\mathrm {a}=(\boldsymbol{e}^\mathrm {a}_i) = (\boldsymbol{e}_i(\boldsymbol{E}(\boldsymbol{x}^\mathrm {a})))$ and $\boldsymbol{e}^\mathrm {b}$ likewise, where $\boldsymbol{x}^\mathrm {a}, \boldsymbol{x}^\mathrm {b}$ are samples from a training distribution $p(\boldsymbol{x}^\mathrm {a}, \boldsymbol{x}^\mathrm {b})$ for the ith semantic concept. The distribution of pairs $\boldsymbol{e}^\mathrm {a}$ and $\boldsymbol{e}^\mathrm {b}$ factorizes into a conditional and a marginal,

$$\begin{aligned} p(\boldsymbol{e}^\mathrm {a}, \boldsymbol{e}^\mathrm {b}) = p(\boldsymbol{e}^\mathrm {b} \vert \boldsymbol{e}^\mathrm {a}) p(\boldsymbol{e}^\mathrm {a}). \end{aligned}$$

(11)

Objective (10) implies a diagonal covariance for the marginal distribution $p(\boldsymbol{e}^\mathrm {a})$, i.e., a standard normal distribution, and (9) entails a correlation between $\boldsymbol{e}^\mathrm {a}_i$ and $\boldsymbol{e}^\mathrm {b}_i$. Therefore, the correlation matrix is $\boldsymbol{\Sigma }^{\mathrm {ab}}= \rho {\text {diag}}((\delta _{\mathcal {I}_i}(k))_{k=1}^{N_{\boldsymbol{\hat{z}}}})$, where

$$\begin{aligned} \delta _{\mathcal {I}_i}(k) = {\left\{ \begin{array}{ll} 1 &{} \text {if}~k \in \mathcal {I}_i,\\ 0 &{} \text {else}.\\ \end{array}\right. } \end{aligned}$$

By symmetry, $p(\boldsymbol{e}^\mathrm {b}) = p(\boldsymbol{e}^\mathrm {a})$, which gives

$$\begin{aligned} p(\boldsymbol{e}^\mathrm {b} \vert \boldsymbol{e}^\mathrm {a}) = \mathcal {N}(\boldsymbol{e}^\mathrm {b} \vert \boldsymbol{\Sigma }^{\mathrm {ab}}\boldsymbol{e}^\mathrm {a}, \boldsymbol{I}- (\boldsymbol{\Sigma }^{\mathrm {ab}})^2) . \end{aligned}$$

(12)

Inserting (12) and a standard normal distribution for $p(\boldsymbol{e}^\mathrm {a})$ into (11) yields the negative log-likelihood for a pair $\boldsymbol{e}^\mathrm {a}, \boldsymbol{e}^\mathrm {b}$.

Given pairs $\boldsymbol{x}^\mathrm {a}, \boldsymbol{x}^\mathrm {b}$ as training data, another change of variables from $\boldsymbol{\hat{z}}^\mathrm {a}=\boldsymbol{E}(x^\mathrm {a})$ to $\boldsymbol{e}^\mathrm {a}=\boldsymbol{e}(\boldsymbol{\hat{z}}^\mathrm {a})$ gives the training loss function for $\boldsymbol{e}$ as the negative log-likelihood of $\boldsymbol{\hat{z}}^\mathrm {a}, \boldsymbol{\hat{z}}^\mathrm {b}$,

$$\begin{aligned} J(\boldsymbol{e}) = \mathbb {E}_{\boldsymbol{x}^\mathrm {a}, \boldsymbol{x}^\mathrm {b}}&\left[ -\log p(\boldsymbol{e}(\boldsymbol{E}(\boldsymbol{x}^\mathrm {a})), \boldsymbol{e}(\boldsymbol{E}(\boldsymbol{x}^\mathrm {b}))) \right. \nonumber \\&\left. -\log \vert \det \nabla \boldsymbol{e}(\boldsymbol{E}(\boldsymbol{x}^\mathrm {a})) \vert -\log \vert \det \nabla \boldsymbol{e}(\boldsymbol{E}(\boldsymbol{x}^\mathrm {b})) \vert \right] . \end{aligned}$$

(13)

For simplicity, we have derived the loss for a single semantic concept $\boldsymbol{e}_i$. Simply summing over the losses of different semantic concepts yields their joint loss function and allows us to learn a joint translator $\boldsymbol{e}$ for all of them.

In the following, we focus on the log-likelihood of pairs. The loss for $\boldsymbol{e}$ in (13) contains the log-likelihood of pairs $\boldsymbol{e}^\mathrm {a}, \boldsymbol{e}^\mathrm {b}$. Inserting (12) and a standard normal distribution for $p(\boldsymbol{e}^\mathrm {a})$ into (11) yields

$$\begin{aligned} -\log p(\boldsymbol{e}^\mathrm {a}, \boldsymbol{e}^\mathrm {b}) = \frac{1}{2} \left( \sum _{k\in \mathcal {I}_i} \frac{(\boldsymbol{e}^\mathrm {b}_k - \rho \boldsymbol{e}^\mathrm {a}_k)^2}{1-\rho ^2} + \sum _{k\in \mathcal {I}_i^c} (\boldsymbol{e}^\mathrm {b}_k)^2 + \sum _{k=1}^{N_{\boldsymbol{\hat{z}}}} (\boldsymbol{e}^\mathrm {a}_k)^2 \right) + C, \end{aligned}$$

(14)

where $C=C(\rho , N_{\boldsymbol{\hat{z}}})$ is a constant that can be ignored for the optimization process. $\rho \in (0,1)$ determines the relative importance of loss terms corresponding to the similarity requirement in (9) and the independence requirement in (10). We use a fixed value of $\rho =0.9$ for all experiments.

In the following, we describe the architecture of the semantic INN. In our implementation, $\boldsymbol{e}$ is built by stacking invertible blocks, see Fig. 2, which consist of three invertible layers: coupling blocks [DSB17], actnorm layers [KD18], and shuffling layers. The final output is split into the factors $(\boldsymbol{e}_i)$, see Fig. 3.

Coupling blocks split their input $\boldsymbol{x}= (\boldsymbol{x}_1 , \boldsymbol{x}_2)$ along the channel dimension and use fully connected neural networks $\boldsymbol{s}_i$ and $\boldsymbol{\tau }_i$ to perform the following computation:

$$\begin{aligned} \tilde{\boldsymbol{x}}_1&= \boldsymbol{x}_1\odot \boldsymbol{s}_1(\boldsymbol{x}_2) + \boldsymbol{\tau }_1(\boldsymbol{x}_2), \end{aligned}$$

(15)

$$\begin{aligned} \tilde{\boldsymbol{x}}_2&= \boldsymbol{x}_2\odot \boldsymbol{s}_2(\tilde{\boldsymbol{x}}_1) + \boldsymbol{\tau }_2(\tilde{\boldsymbol{x}}_1), \end{aligned}$$

(16)

with the element-wise multiplication operator $\odot $. Actnorm layers consist of learnable shift and scale parameters for each channel, which are initialized to ensure activations with zero mean and unit variance on the first training batch. Shuffling layers use a fixed, randomly initialized permutation to shuffle the channels of its input, which provides a better mixing of channels for subsequent coupling layers.

Conditional INN $\boldsymbol{t}$ for recovering invariances of deep representations: We first elaborate on the architecture of the conditional INN. We build the conditional invertible neural network $\boldsymbol{t}$ by expanding the semantic model $\boldsymbol{e}$ as follows: Given a model representation $\boldsymbol{z}$, which is used as the conditioning of the INN, we first calculate its embedding

$$\begin{aligned} \boldsymbol{h} = \boldsymbol{H}(\boldsymbol{z}) \end{aligned}$$

(17)

which is subsequently fed into the affine coupling block:

$$\begin{aligned} \tilde{\boldsymbol{x}}_1&= \boldsymbol{x}_1\odot \boldsymbol{s}_1(\boldsymbol{x}_2, \boldsymbol{h}) + \boldsymbol{\tau }_1(\boldsymbol{x}_2, \boldsymbol{h}), \end{aligned}$$

(18)

$$\begin{aligned} \tilde{\boldsymbol{x}}_2&= \boldsymbol{x}_2\odot \boldsymbol{s}_2(\tilde{\boldsymbol{x}}_1, \boldsymbol{h}) + \boldsymbol{\tau }_2(\tilde{\boldsymbol{x}}_1, \boldsymbol{h}), \end{aligned}$$

(19)

with $\odot $ again being an element-wise multiplication operator, where $\boldsymbol{s}_i$ and $\boldsymbol{\tau }_i$ are modified from (16) such that they are capable of processing a concatenated input $(\boldsymbol{x}_i, \boldsymbol{h})$. The embedding module $\boldsymbol{H}$ is usually a shallow convolutional neural network used to down-/upsample a given model representation $\boldsymbol{z}$ to a size that the networks $\boldsymbol{s}_i$ and $\boldsymbol{\tau }_i$ are able to process. This means that $\boldsymbol{t}$, analogous to $\boldsymbol{e}$, consists of stacked invertible blocks, where each block is composed of coupling blocks, actnorm layers, and shuffling layers, cf. Sect. 3.3, details on the INN $\boldsymbol{e}$ for revealing semantics of deep representations, and Fig. 2. The complete architectures of both $\boldsymbol{t}$ and $\boldsymbol{e}$ are depicted in Fig. 3. Additionally, Fig. 4 provides a graphical distinction of the training and testing process of $\boldsymbol{t}$. During training, the autoencoder $\boldsymbol{D}\circ \boldsymbol{E}$ provides a representation of the data that contains both the invariances and the representation of some model w.r.t. the input $\boldsymbol{x}$. After training of $\boldsymbol{t}$, the encoder may be discarded and visual decodings and/or semantic interpretations of a model representation $\boldsymbol{z}$ can be obtained by sampling and transforming $\boldsymbol{v}$ as described in (2).

4 Experiments

To explore the applicability of our approach, we conduct experiments on several models: SqueezeNet [IHM+16], which provides lightweight classification, FaceNet [SKP15], a baseline for face recognition and clustering, trained on the VGGFace2 dataset [CSX+18], and variants of ResNet [HZRS16], a popular architecture often used when fine-tuning a classifier on a specific task and dataset.

Experiments are conducted on the following datasets: CelebA [LLWT15], AnimalFaces [LHM+19], Animals (containing carnivorous animals), ImageNet [DDS+09], and ColorMNIST, which is an augmented version of the MNIST dataset [LCB98], where both background and foreground have random, independent colors. Evaluation details follow in Sect. 4.5.

4.1 Comparison to Existing Methods

A key insight of our chapter is that reconstructions from a given model’s representation $\boldsymbol{z}= \boldsymbol{\Phi }(\boldsymbol{x})$ are impossible if the invariances the model has learned are not considered. In Fig. 5, we compare our approach to existing methods that either try to reconstruct the image via gradient-based optimization [MV16] or by training a reconstruction network directly on the representations $\boldsymbol{z}$ [DB16]. By conditionally sampling images $\boldsymbol{\hat{x}}= \boldsymbol{D}(\boldsymbol{\hat{z}})$, where we obtain $\boldsymbol{\hat{z}}$ via the INN $\boldsymbol{t}$ as described in (2) based on the invariances $\boldsymbol{v}\sim p(\boldsymbol{v}) = \mathcal {N}(\boldsymbol{0}, \boldsymbol{I})$, we bypass this shortcoming and obtain natural images without artifacts for any layer depth. The increased image quality is further confirmed by the Fréchet inception distance (FID) scores [HRU+17] reported in Table 4.

4.2 Understanding Models

Interpreting a face recognition model: FaceNet [SKP15] is a widely accepted baseline in the field of face recognition. This model embeds input images of human faces into a latent space where similar images have a small $L_2$-distance. We aim to understand the process of face recognition within this model by analyzing and visualizing learned invariances for several layers explicitly; see Table 6 for a detailed breakdown of the various layers of FaceNet. For the experiment, we use a pretrained FaceNet and train the generative model presented in (2) by conditioning on various layers. Figure 6 depicts the amount of variance present in each selected layer when generating $n=250$ samples for each of the 100 different input images. This variance serves as a proxy for the amount of abstraction capability FaceNet has learned in its respective layers: more abstract representations allow for a rich variety of corresponding synthesized images, resulting in a large variance in image space when being decoded. We observe an approximate exponential growth of learned invariances with increasing layer depth, suggesting abstraction mainly happens in the deepest layers of the network. Furthermore, we are able to synthesize images that correspond to the given model representation for each selected layer.

Table 4 FID scores for layer visualizations of AlexNet, obtained with our method and [DB16] (D&B). Scores are calculated on the Animals dataset

Full size table

How does relevance of different concepts emerge during training? Humans tend to provide explanations of entities by describing them in terms of their semantics, e.g., size or color. In a similar fashion, we want to semantically understand how a network (here: SqueezeNet) learns to solve a given problem.

Intuitively, a network should, for example, be able to solve a given classification problem by focusing on the relevant information while discarding task-irrelevant information. To build on this intuition, we construct a toy problem: digit classification on ColorMNIST. We expect the model to ignore both the random background and foreground colors of the input data, as it does not help making a classification decision. Thus, we apply the invertible approach presented in Sect. 3.2 and recover three distinct factors: digit class, background color, and foreground color. To capture the semantic changes occurring over the course of training of this classifier, we couple 20 instances of the invertible interpretation model on the last convolutional layer, each representing a checkpoint between iteration 0 and iteration 40000 (equally distributed). The result is shown in Fig. 7: we see that the digit factor becomes increasingly more relevant, with its relevance being strongly correlated to the accuracy of the model.

4.3 Effects of Data Shifts on Models

This section investigates the effects that altering input data has on the model we want to understand. We examine these effects by manipulating input data through adversarial attacks or image stylization.

How do adversarial attacks affect network representations? Here, we experiment with the fast gradient sign method (FGSM) [GSS15], which manipulates the input image by maximizing the objective of a given classification model. To understand how such an attack modifies representations of a given model, we first compute the image’s invariances with respect to the model as $\boldsymbol{v}= \boldsymbol{t}(\boldsymbol{E}(\boldsymbol{x}) \vert \boldsymbol{\Phi }(\boldsymbol{x}))$. For an attacked image $\boldsymbol{x}^{*}$, we then compute the attacked representation as $\boldsymbol{z}^{*}=\boldsymbol{\Phi }(\boldsymbol{x}^{*})$. Decoding this representation with the original invariance $\boldsymbol{v}$ allows us to precisely visualize what the adversarial attack changed. This decoding, $\boldsymbol{\hat{x}}^{*} = \boldsymbol{D}(\boldsymbol{t}(\boldsymbol{v}\vert \boldsymbol{z}^{*}))$, is shown in Fig. 8. We observe that, over layers of the network, the adversarial attack gradually changes the representation toward its target. Its ability to do so is strongly correlated with the amount of invariances, quantified as the total variance explained by $\boldsymbol{v}$, for a given layer as also observed in [JBZB19].

How does training on different data affect the model? Geirhos et al. [GRM+19] proposed the hypothesis that classification networks based on convolutional blocks mainly focus on texture patterns to obtain class probabilities. We further validate this hypothesis by training our invertible network $\boldsymbol{t}$ conditioned on pre-logits $\boldsymbol{z}= \boldsymbol{\Phi }(\boldsymbol{x})$ (i.e., the penultimate layer) of two ResNet-50 realizations. As shown in Fig. 9, a ResNet architecture trained on standard ImageNet is susceptible to the so-called “texture-bias”, as samples generated conditioned on representation of pure texture images consistently show valid images of corresponding input classes. We furthermore visualize that this behavior can indeed be removed by training the same architecture on a stylized version of ImageNet^{Footnote 1}; the classifier does focus on shape. Rows 10–12 of Fig. 9 show that the proposed approach can be used to generate sketch-based content with the texture-agnostic network.

4.4 Modifying Representations

Invertible access to semantic concepts enables targeted modifications of representations $\boldsymbol{\hat{z}}$. In combination with a decoder for $\boldsymbol{\hat{z}}$, we obtain semantic image editing capabilities. We provide an example in Fig. 10, where we modify the factors hair color, glasses, gender, beard, age, and smile. We infer $\boldsymbol{\hat{z}}=\boldsymbol{E}(\boldsymbol{x})$ from an input image. Our semantic INN $\boldsymbol{e}$ then translates this representation into semantic factors $(\boldsymbol{e}_i)_i=\boldsymbol{e}(\boldsymbol{\hat{z}})$, where individual semantic concepts can be modified independently via the corresponding factor $\boldsymbol{e}_i$. In particular, we can replace each factor with that from another image, effectively transferring semantics from one representation onto another. Due to the invertibility of $\boldsymbol{e}$, the modified representation can be translated back into the space of the autoencoder and is readily decoded to a modified image $\boldsymbol{x}^{*}$.

To observe which semantic concepts FaceNet is sensitive to, we compute the average distance $\Vert \boldsymbol{f}(\boldsymbol{x}) - \boldsymbol{f}(\boldsymbol{x}^{*})\Vert $ between its embeddings of $\boldsymbol{x}$ and semantically modified $\boldsymbol{x}^{*}$ over the test set (last row in Fig. 10). Evidently, FaceNet is particularly sensitive to differences in gender and glasses. The latter suggests a failure of FaceNet to identify persons correctly after they put on glasses.

Table 5 Hyperparameters of INNs for each experiment. Parameter $n_{flow}$ denotes the number of invertible blocks within in the model, see Fig. 2. Parameters $h_w$ and $h_d$ refer to the width and depth of the fully connected subnetworks $\boldsymbol{s}_i$ and $\boldsymbol{\tau }_i$, respectively

Full size table

Table 6 High-level architectures of FaceNet and ResNet, depicted as pytorch-modules. Layers investigated in our experiments are marked in bold. Spatial sizes are provided as a visual aid and vary from model to model in our experiments. If not stated otherwise, we always extract from the last layer in a series of blocks (e.g., on the right: $23\times $ BottleNeck down $\rightarrow \mathbb {R}^{8 \times 8 \times 1024}$ refers to the last module in the series of 23 blocks)

Full size table

Table 7 High-level architectures of SqueezeNet and AlexNet, depicted as pytorch-modules. See Table 6 for further details

Full size table

4.5 Evaluation Details

Here, we provide additional details on the investigated neural networks in Sect. 4 and present a way to quantify the amount of invariances in those networks. This section is only for completeness. Similar to Sect. 3.3, this section can be skipped for readers who are interested in the higher level concepts of our approach rather than its technical details. An overview of INN hyperparameters for all experiments is provided in Table 5.

Throughout our experiments, we interpret four different models: SqueezeNet, AlexNet, ResNet, and FaceNet. Summaries of each of model’s architecture are provided in Table 6 and Table 7. Implementations and pretrained weights of these models are taken from:

SqueezeNet (1.1) https://pytorch.org/docs/stable/_modules/torchvision/models/squeezenet;
ResNet: https://pytorch.org/docs/stable/_modules/torchvision/models/resnet.html;
AlexNet: https://pytorch.org/docs/stable/_modules/torchvision/models/alexnet.html;
FaceNet: https://github.com/timesler/facenet-pytorch.

Explained variance: To quantify the amount of invariances and semantic concepts, we use the fraction of the total variance explained by invariances (Fig. 8) and the fraction of the variance of a semantic concept explained by the model representation (Fig. 7).

Using the INN $\boldsymbol{t}$, we can consider $\boldsymbol{\hat{z}}= \boldsymbol{t}^{-1}(\boldsymbol{v}\vert \boldsymbol{z})$ as a function of $\boldsymbol{v}$ and $\boldsymbol{z}$. The total variance of $\boldsymbol{\hat{z}}$ is then obtained by sampling $\boldsymbol{v}$, via its prior which is a standard normal distribution, and $\boldsymbol{z}$, via $\boldsymbol{z}= \boldsymbol{\Phi }(\boldsymbol{x})$ with $\boldsymbol{x}\sim p_{\text {valid}}(\boldsymbol{x})$ sampled from a validation set. We compare this total variance to the average variance obtained when sampling $\boldsymbol{v}$ for a given $\boldsymbol{z}$ to obtain the fraction of the total variance explained by invariances:

(20)

In combination with the INN $\boldsymbol{e}$, which transform $\boldsymbol{\hat{z}}$ to semantically meaningful factors, we can analyze the semantic content of a model representation $\boldsymbol{z}$. To analyze how much of a semantic concept represented by factor $\boldsymbol{e}_i$ is captured by $\boldsymbol{z}$, we use $\boldsymbol{e}$ to transform $\boldsymbol{\hat{z}}$ into $\boldsymbol{e}_i$ and measure its variance. To measure how much the semantic concept is explained by $\boldsymbol{z}$, we simply swap the roles of $\boldsymbol{z}$ and $\boldsymbol{v}$ in (20), to obtain

(21)

Figure 8 reports (20) and its standard error when evaluated via 10k samples, and Fig. 7 reports (21) and its standard error when evaluated via 10k samples.

Conclusions

Understanding a representation in terms of both its semantics and learned invariances is crucial for interpretation of deep networks. We presented an approach (i) to recover the invariances a model has learned and (ii) to translate the representation and its invariances onto an equally expressive yet semantically accessible encoding. Our diagnostic method is applicable in a plug-and-play fashion on top of existing deep models with no need to alter or retrain them. Since our translation onto semantic factors is bijective, it loses no information and also allows for semantic modifications. Moreover, recovering invariances probabilistically guarantees that we can correctly visualize representations and sample them without leaving the underlying distribution, which is a common cause for artifacts. Altogether, our approach constitutes a powerful, widely applicable diagnostic pipeline for explaining deep representations.

Notes

1.
We used weights available at https://github.com/rgeirhos/texture-vs-shape.

References

L. Ardizzone, J. Kruse, S. Wirkert, D. Rahner, E.W. Pellegrini, R.S. Klessen, L. Maier-Hein, C. Rother, U. Köthe, Analyzing inverse problems with invertible neural networks, in Proceedings of the International Conference on Learning Representations (ICLR) (New Orleans, LA, USA, 2019), pp. 1–20
Google Scholar
Alessandro Achille, Stefano Soatto, Emergence of invariance and disentanglement in deep representations. J. Mach. Learn. Res. (JMLR) 19(50), 1–34 (2018)
MathSciNet MATH Google Scholar
Sebastian Bach, Alexander Binder, Grégoire. Montavon, Frederick Klauschen, Klaus-Robert. Müller, Wojciech Samek, On pixel-wise explanations for non-linear classifier decisions by layer-wise relevance propagation. PLOS ONE 10(7), 1–46 (2015)
Google Scholar
A. Brock, J. Donahue, K. Simonyan, Large scale GAN training for high fidelity natural image synthesis, in Proceedings of the International Conference on Learning Representations (ICLR) (New Orleans, LA, USA, 2019), pp. 1–35
Google Scholar
A. Blattmann, T. Milbich, M. Dorkenwald, B. Ommer, Behavior-driven synthesis of human dynamics, in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), virtual conference (2021), pp. 12236–12246
Google Scholar
A. Blattmann, T. Milbich, M. Dorkenwald, B. Ommer, iPOKE: poking a still image for controlled stochastic video synthesis, in Proceedings of the IEEE International Conference on Computer Vision (ICCV), virtual conference (2021), pp. 14707–14717
Google Scholar
D. Bau, B. Zhou, A. Khosla, A. Oliva, A. Torralba, Network dissection: quantifying interpretability of deep visual representations, in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (Honolulu, HI, USA, 2017), pp. 6541–6549
Google Scholar
D. Bau, J-Y. Zhu, H. Strobelt, B. Zhou, J.B. Tenenbaum, W.T. Freeman, A. Torralba, GAN dissection: visualizing and understanding generative adversarial networks, in Proceedings of the International Conference on Learning Representations (ICLR) (New Orleans, LA, USA, 2019), pp. 1–18
Google Scholar
Q. Cao, L. Shen, W. Xie, O.M. Parkhi, A. Zisserman, VGGFace2: a dataset for recognising faces across pose and age, in Proceedings of the IEEE International Conference on Automatic Face & Gesture Recognition (FG) (Xi’an, China, 2018), pp. 67–74
Google Scholar
S.A. Cadena, M.A. Weis, L.A. Gatys, M. Bethge, A.S. Ecker, Diverse feature visualizations reveal invariances in early layers of deep neural networks, in Proceedings of the European Conference on Computer Vision (ECCV) (Munich, Germany, 2018), pp. 225–240
Google Scholar
A. Dosovitskiy, T. Brox, Generating images with perceptual similarity metrics based on deep networks, in Proceedings of the Conference on Neural Information Processing Systems (NIPS/NeurIPS) (Barcelona, Spain, 2016), pp. 658–666
Google Scholar
J. Deng, W. Dong, R. Socher, L-J. Li, K. Li, F-F. Li, ImageNet: a large-scale hierarchical image database, in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (Miami, FL, USA, 2009), pp. 248–255
Google Scholar
M. Dorkenwald, T. Milbich, A. Blattmann, R. Rombach, K.G. Derpanis, B. Ommer, Stochastic image-to-video synthesis using cINNs, in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), virtual conference (2021), pp. 3742–3753
Google Scholar
L. Dinh, J. Sohl-Dickstein, S. Bengio, Density estimation using real NVP, in Proceedings of the International Conference on Learning Representations (ICLR) (Toulon, France, 2017), pp. 1–32
Google Scholar
B. Dai, D. Wipf, Diagnosing and enhancing VAE models, in Proceedings of the International Conference on Learning Representations (ICLR) (New Orleans, LA, USA, 2019), pp. 1–12
Google Scholar
P. Esser, J. Haux, B. Ommer, Unsupervised robust disentangling of latent characteristics for image synthesis, in Proceedings of the IEEE International Conference on Computer Vision (ICCV) (Seoul, Korea, 2019), pp. 2699–2709
Google Scholar
P. Esser, R. Rombach, B. Ommer, A disentangling invertible interpretation network for explaining latent representations, in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), virtual conference (2020), pp. 9223–9232
Google Scholar
European Commission, White Paper on Artificial Intelligence – A European Approach to Excellence and Trust. Technical report, European Union (2020). https://eur-lex.europa.eu/legal-content/EN/TXT/?uri=COM:2020:65:FIN
R. Fong, A. Vedaldi, Net2Vec: quantifying and explaining how concepts are encoded by filters in deep neural networks, in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (Salt Lake City, UT, USA, 2018), pp. 8730–8738
Google Scholar
L. Goetschalckx, A. Andonian, A. Oliva, P. Isola, GANalyze: toward visual definitions of cognitive image properties, in Proceedings of the IEEE International Conference on Computer Vision (ICCV) (Seoul, Korea, 2019), pp. 5744–5753
Google Scholar
Bryce Goodman, Seth Flaxman, European union regulations on algorithmic decision-making and a “right to explanation’’. AI Mag. 38(3), 50–57 (2017)
Google Scholar
R. Geirhos, P. Rubisch, C. Michaelis, M. Bethge, F.A. Wichmann, W. Brendel, ImageNet-trained CNNs are biased towards texture; increasing shape bias improves accuracy and robustness, in Proceedings of the International Conference on Learning Representations (ICLR) (New Orleans, LA, USA, 2019), pp. 1–22
Google Scholar
I. Goodfellow, J. Shlens, C. Szegedy, Explaining and harnessing adversarial examples, in Proceedings of the International Conference on Learning Representations (ICLR) (San Diego, CA, USA, 2015), pp. 1–11
Google Scholar
M. Heusel, H. Ramsauer, T. Unterthiner, B. Nessler, S. Hochreiter, GANs trained by a two time-scale update rule converge to a local nash equilibrium, in Proceedings of the Conference on Neural Information Processing Systems (NIPS/NeurIPS) (Long Beach, CA, USA, 2017), pp. 6626–6637
Google Scholar
K. He, X. Zhang, S. Ren, J. Sun, Deep residual learning for image recognition, in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (Las Vegas, NV, USA, 2016), pp. 770–778
Google Scholar
F.N. Iandola, S. Han, M.W. Moskewicz, K. Ashraf, W.J. Dally, K. Keutzer, SqueezeNet: AlexNet-Level Accuracy With 50x Fewer Parameters and< 0.5 MB Model Size (2016) pp. 1–13. arXiv:1602.07360
J-H. Jacobsen, J. Behrmann, R. Zemel, M. Bethge, Excessive invariance causes adversarial vulnerability, in Proceedings of the International Conference on Learning Representations (ICLR) (New Orleans, LA, USA, 2019), pp. 1–17
Google Scholar
D.P. Kingma, P. Dhariwal, Glow: generative flow with invertible 1x1 convolutions, in Proceedings of the Conference on Neural Information Processing Systems (NIPS/NeurIPS) (Montréal, QC, Canada, 2018), pp. 10236–10245
Google Scholar
A. Krizhevsky, I. Sutskever, G.E. Hinton, ImageNet classification with deep convolutional neural networks, in Proceedings of the Conference on Neural Information Processing Systems (NIPS/NeurIPS) (Lake Tahoe, NV, USA, 2012), pp. 1106–1114
Google Scholar
D. Kotovenko, A. Sanakoyeu, S. Lang, B. Ommer, Content and style disentanglement for artistic style transfer, in:Proceedings of the IEEE International Conference on Computer Vision (ICCV) (Seoul, Korea, 2019), pp. 4422–4431
Google Scholar
D.P. Kingma, M. Welling, Auto-encoding variational Bayes, in Proceedings of the International Conference on Learning Representations (ICLR) (Banff, AB, Canada, 2014), pp. 1–14
Google Scholar
T.D. Kulkarni, W. Whitney, P. Kohli, J.B. Tenenbaum, Deep convolutional inverse graphics network, in Proceedings of the Conference on Neural Information Processing Systems (NIPS/NeurIPS) (Montréal, QC, Canada, 2015), pp. 2539–2547
Google Scholar
F. Locatello, S. Bauer, M. Lucic, G. Raetsch, S. Gelly, B. Schölkopf, O. Bachem, Challenging common assumptions in the unsupervised learning of disentangled representations, in Proceedings of the International Conference on Machine Learning (ICML) (Long Beach, CA, USA, 2019), pp. 4114–4124
Google Scholar
D. Lorenz, L. Bereska, T. Milbich, B. Ommer, Unsupervised part-based disentangling of object shape and appearance, in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (Long Beach, CA, USA, 2019), pp. 10955–10964
Google Scholar
Y. LeCun, C. Cortes, C.J.C. Burges, The MNIST Database of Handwritten Digits (1998) Retrieved from 2021-11-18
Google Scholar
Y. LeCun, Learning invariant feature hierarchies, in Proceedings of the European Conference on Computer Vision (ECCV) Workshops (Firenze, Italy, 2012), pp. 496–505
Google Scholar
M-Y. Liu, X. Huang, A. Mallya, T. Karras, T. Aila, J. Lehtinen, J. Kautz, Few-shot unsupervised image-to-image translation, in Proceedings of the IEEE International Conference on Computer Vision (ICCV) (Seoul, Korea, 2019), pp. 10551–10560
Google Scholar
Z.C. Lipton, The mythos of model interpretability: in machine learning, the concept of interpretability is both important and slippery. ACM Queue 16(3), 31–57 (2018)
Google Scholar
Z. Liu, P. Luo, X. Wang, X. Tang, Deep learning face attributes in the wild, in Proceedings of the IEEE International Conference on Computer Vision (ICCV) (Santiago, Chile, 2015), pp. 3730–3738
Google Scholar
Y. Li, K. Kumar Singh, U. Ojha, Y.J. Lee, MixNMatch: multifactor disentanglement and encoding for conditional image generation, in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), virtual conference (2020), pp. 8039–8048
Google Scholar
Tim Miller, Explanation in artificial intelligence: insights from the social sciences. Artif. Intell. 267, 1–38 (2019)
Article MathSciNet Google Scholar
Grégoire. Montavon, Sebastian Lapuschkin, Alexander Binder, Wojciech Samek, Klaus-Robert Müller, Explaining nonlinear classification decisions with deep Taylor decomposition. Pattern Recognit. 65(5), 211–222 (2017)
Google Scholar
A. Mordvintsev, C. Olah, M. Tyka, Inceptionism: going deeper into neural networks (2015). Accessed 2021-11-18
Google Scholar
Grégoire. Montavon, Wojciech Samek, Klaus-Robert. Müller, Methods for interpreting and understanding deep neural networks. Digital Signal Process. 73, 1–15 (2018)
Article MathSciNet Google Scholar
Aravindh Mahendran, Andrea Vedaldi, Visualizing deep convolutional neural networks using natural pre-images. Int. J. Comput. Vis. (IJCV) 120, 233–255 (2016)
Article MathSciNet Google Scholar
A. Nguyen, A. Dosovitskiy, J. Yosinski, T. Brox, J. Clune, Synthesizing the preferred inputs for neurons in neural networks via deep generator networks, in Proceedings of the Conference on Neural Information Processing Systems (NIPS/NeurIPS) (Barcelona, Spain, 2016), pp. 3387–3395
Google Scholar
C. Nash, N. Kushman, C.K.I. Williams, Inverting supervised representations with autoregressive neural density models, in Proceedings of the International Conference on Artificial Intelligence and Statistics (AISTATS) (Naha, Japan, 2019), pp. 1620–1629
Google Scholar
G. Plumb, M. Al-Shedivat, Á.A. Cabrera, A. Perer, E. Xing, A. Talwalkar, Regularizing black-box models for improved interpretability, in Proceedings of the Conference on Neural Information Processing Systems (NIPS/NeurIPS), virtual conference (2020), pp. 10526–10536
Google Scholar
A. Norman Redlich, Supervised factorial learning. Neural Comput. 5(5), 750–766 (1993)
Google Scholar
R. Rombach, P. Esser, B. Ommer, Network-to-network translation with conditional invertible neural networks, in Proceedings of the Conference on Neural Information Processing Systems (NIPS/NeurIPS), virtual conference (2020), pp. 2784–2797
Google Scholar
D.J. Rezende, S. Mohamed, D. Wierstra, Stochastic backpropagation and approximate inference in deep generative models, in Proceedings of the International Conference on Learning Representations (ICLR) (Banff, AB, Canada, 2014), pp. 1–14
Google Scholar
Ramprasaath R. Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, Dhruv Batra, Grad-CAM: visual explanations from deep networks via gradient-based localization. Int. J. Comput. Vis. (IJCV) 128, 336–359 (2020)
Article Google Scholar
A. Shocher, Y. Gandelsman, I. Mosseri, M. Yarom, M. Irani, W.T. Freeman, T. Dekel, Semantic pyramid for image generation, in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), virtual conference (2020), pp. 7457–7466
Google Scholar
S. Santurkar, A. Ilyas, D. Tsipras, L. Engstrom, B. Tran, A. Madry, Image synthesis with a single (robust) classifier, in Proceedings of the Conference on Neural Information Processing Systems (NIPS/NeurIPS) (Vancouver, BC, Canada, 2019), pp. 1260–1271
Google Scholar
F. Schroff, D. Kalenichenko, J. Philbin, FaceNet: a unified embedding for face recognition and clustering, in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (Boston, MA, USA, 2015), pp. 815–823
Google Scholar
M. Simon, E. Rodner, Neural activation constellations: unsupervised part model discovery with convolutional networks, in Proceedings of the IEEE International Conference on Computer Vision (ICCV) (Santiago, Chile, 2015), pp. 1143–1151
Google Scholar
M. Simon, E. Rodner, J. Denzler, Part detector discovery in deep convolutional neural networks, in Proceedings of the Asian Conference on Computer Vision (ACCV) (Singapore, Singapore, 2014), pp. 162–177
Google Scholar
K. Simonyan, A. Vedaldi, A. Zisserman, Deep inside convolutional networks: visualising image classification models and saliency maps, in Proceedings of the International Conference on Learning Representations (ICLR) Workshops (Banff, AB, Canada, 2014), pp. 1–8
Google Scholar
W. Samek, T. Wiegand, K-R. Müller, Explainable Artificial Intelligence: Understanding. Visualizing and Interpreting Deep Learning Models (Springer, 2017)
Google Scholar
C. Szegedy, W. Zaremba, I. Sutskever, J. Bruna, D. Erhan, I. Goodfellow, R. Fergus, Intriguing properties of neural networks, in Proceedings of the International Conference on Learning Representations (ICLR) (Banff, AB, Canada, 2014), pp. 1–10
Google Scholar
P. Upchurch, J. Gardner, G. Pleiss, R. Pless, N. Snavely, K. Bala, K. Weinberger, Deep feature interpolation for image content changes, in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (Honolulu, HI, USA, 2017), pp. 7064–7073
Google Scholar
Z. Xiao, Q. Yan, Y. Amit, Generative Latent Flow (2019), pp. 1–18. arXiv:1905.10485
J. Yosinski, J. Clune, A. Nguyen, T. Fuchs, H. Lipson, Understanding neural networks through deep visualization, in Proceedings of the International Conference on Machine Learning (ICML) Workshops (2015), pp. 1–12
Google Scholar
M.D. Zeiler, R. Fergus, Visualizing and understanding convolutional networks, in Proceedings of the European Conference on Computer Vision (ECCV) (Zurich, Switzerland, 2014), pp. 818–833
Google Scholar
R. Zhang, P. Isola, A.A. Efros, E. Shechtman, O. Wang, The unreasonable effectiveness of deep features as a perceptual metric, in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (Salt Lake City, UT, USA, 2018), pp. 586–595
Google Scholar
B. Zhou, A. Khosla, A. Lapedriza, A. Oliva, A. Torralba, Object detectors emerge in deep scene CNNs, in Proceedings of the International Conference on Learning Representations (ICLR) (San Diego, CA, USA, 2015), pp. 1–12
Google Scholar
B. Zhou, A. Khosla, A. Lapedriza, A. Oliva, A. Torralba, Learning deep features for discriminative localization, in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (Las Vegas, NV, USA, 2016), pp. 2921–2929
Google Scholar
Q. Zhang, Y. Nian Wu, S-C. Zhu, Interpretable convolutional neural networks, in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (Salt Lake City, UT, USA, 2018), pp. 8827–8836
Google Scholar

Download references

Author information

Authors and Affiliations

HCI, IWR, Heidelberg University, Berliner Str. 43, 69120, Heidelberg, Germany
Robin Rombach, Patrick Esser, Andreas Blattmann & Björn Ommer

Authors

Robin Rombach
View author publications
You can also search for this author in PubMed Google Scholar
Patrick Esser
View author publications
You can also search for this author in PubMed Google Scholar
Andreas Blattmann
View author publications
You can also search for this author in PubMed Google Scholar
Björn Ommer
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Robin Rombach .

Editor information

Editors and Affiliations

Institute for Communications Technology, Technische Universität Braunschweig, Braunschweig, Germany
Tim Fingscheidt
Fachgruppe Mathematik und Informatik, Bergische Universität Wuppertal, Wuppertal, Germany
Hanno Gottschalk
Schloss Birlinghoven, Fraunhofer Institute for Intelligent Analysis and Information Systems IAIS, Sankt Augustin, Germany
Sebastian Houben

Rights and permissions

Open Access This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.

The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.

Reprints and permissions

Copyright information

About this chapter

Cite this chapter

Rombach, R., Esser, P., Blattmann, A., Ommer, B. (2022). Invertible Neural Networks for Understanding Semantics of Invariances of CNN Representations. In: Fingscheidt, T., Gottschalk, H., Houben, S. (eds) Deep Neural Networks and Data for Automated Driving. Springer, Cham. https://doi.org/10.1007/978-3-031-01233-4_7

Download citation

DOI: https://doi.org/10.1007/978-3-031-01233-4_7
Published: 18 June 2022
Publisher Name: Springer, Cham
Print ISBN: 978-3-031-01232-7
Online ISBN: 978-3-031-01233-4
eBook Packages: EngineeringEngineering (R0)

Publish with us

Policies and ethics

Invertible Neural Networks for Understanding Semantics of Invariances of CNN Representations

Abstract

Similar content being viewed by others

Making Sense of CNNs: Interpreting Deep Representations and Their Invariances with INNs

Visual interpretability for deep learning: a survey

Comparing the Interpretability of Deep Networks via Network Dissection

1 Introduction

2 Background

3 Method

3.1 Recovering the Invariances of Deep Models