1 Introduction

In recent years, neural networks have been a driving force behind many of the most exciting success stories in machine learning. From image recognition [24] and speech recognition [10] to playing complex games on a superhuman level [51], neural networks have achieved results that were almost unthinkable even a decade ago.

However, while the size, performance and scope of neural networks steadily increases, their opaqueness remains an equally important and essentially unsolved problem [2]. Frequently denoted as “blackbox”-models, the decisions of neural networks are to this day hard to explain and, likewise, their properties hard to verify.

In this paper, we are concerned with Typed Affine Decision Structures (TADS) [45], a novel decision-tree-like datastructure that represents piece-wise affine functions. TADS are specifically designed to act as interpretable white-box models that can precisely represent any piece-wise linear neural network in an understandable fashion.

While TADS are structurally well suited for global model explanation and verification of neural networks, the full explanation of even medium-sized neural networks is well out of scope. It is well known that the semantic complexity of a neural network, with respect to many different measures of complexity, grows exponentially in its size. As a consequence, any precise global explanation of such a model incurs exponential scaling issues [7, 18, 39].

In this paper, we are interested in applying TADS to verifying local properties of neural networks, most notably robustness properties [12, 35, 40, 49]. Robustness properties encode that, at certain points that a user desires, a neural network’s classification is invariant to small changes of its input. For example, in image recognition, if one knows that an image represents a certain object, a single flip of a pixel should not drastically change the networks correct classification of said image. Robustness properties are the most commonly considered properties in neural network verification and make up the majority of current benchmarks in the VNNComp verification competition [6].

To apply TADS, which are in principle global model explanations, to local properties, we will introduce precondition projection, a transformation of TADS that restricts their domain to a certain region of interest. Further, we show how the algebraic properties of TADS can be used to directly model the classification behavior of a neural network. This is important as neural networks, although often used as classifiers (assigning one of finitely many classes to an input), are fundamentally regression models (assigning real values to their input). By modeling the argmax function directly on a TADS level, this gap can be bridged in an elegant fashion.

Finally, we will present a case study in which we apply TADS to robustness analysis and present its advantages. At present, TADS do not yet scale well to larger problems. We will introduce an approach that uses the well-understood dimensionality reduction technique PCA to prove an underapproximation to the robustness property of interest. This approach mitigates the scaling issues incurred by TADSs, but lacks reliable guarantees on robustness. Thus, we introduce another approach that directly trains neural networks to operate on inputs that are simplified by PCA. This method is of similar computational complexity than the underapproximation approach, but yields neural networks for which TADS can give reliable robustness guarantees, while incurring only a small loss in neural network accuracy.

Lastly, we will show on a concrete example how a TADS-based robustness proof looks like and what additional information it yields beyond already existing verification tools. We will show how this information can be used to characterize precisely and completely the entire set of inputs that violate a given property, and how it can be used to find “closest” adversarial examples, if they exist.

2 Preliminaries

2.1 Linear algebra and notation

The following notations of linear algebra are based on the book [5]. The real vector space with \(n > 0\) is an algebraic structure with the operations

which are defined as

$$\begin{aligned} (x_{1},\dots , x_{n}) + (y_{1},\dots ,y_{n}) &= (x_{1} + y_{1}, \dots , x_{n} + y_{n}) \\ \lambda \cdot (x_{1},\dots ,x_{n}) &= (\lambda \cdot x_{1}, \dots , \lambda \cdot x_{n}) \end{aligned}$$

A real vector \((x_{1},\dots ,x_{n})\) of is abbreviated as \(\vec{x}\). To refer to its \(i\)-th component, we write \(x_{i}\) (in contrast, \(\vec{x_{i}}\) denotes the \(i\)-th vector in some enumeration). The dimension of a real vector space is given as .

A matrix \(\boldsymbol{W}\) is a collection of real values arranged in a rectangular array with \(n\) rows and \(m\) columns.

$$ \boldsymbol{W} = \begin{pmatrix} w_{1,1} &w_{1,2} & \ldots &w_{1,m} \\ w_{2,1} &w_{2,2} & \ldots &w_{2,m} \\ \vdots &\vdots &\ddots &\vdots \\ w_{n,1} &w_{n,2} & \ldots &w_{n,m} \end{pmatrix} $$

To indicate the number of rows and columns, one says \(\boldsymbol{W}\) has type \(n \times m\) commonly notated as .

An element at position \(i,j\) of the matrix \(\boldsymbol{W}\) is denoted by (where \(1 \leq i \leq n\) and \(1 \leq j \leq m\)). A matrix can be reflected along the main diagonal resulting in the transpose \(\boldsymbol{W}^{\top }\) of shape \(m \times n\) defined by the equation

The \(i\)-th row of \(\boldsymbol{W}\) can be regarded as a \(1 \times m\) matrix given by

Similarly, the \(j\)-th column of \(\boldsymbol{W}\) can be regarded as a \(n \times 1\) matrix defined as

Matrix addition is defined over matrices with the same type to be component-wise, i.e.,

and scalar multiplication as

The (type-correct) multiplication of two matrices and is defined as

Identifying

  • \(n \times 1\) matrices with (column) vectors

  • \(1 \times m\) matrices with row vectors

  • \(1 \times 1\) matrices with scalars

as indicated above, makes the well-known dot product of

just a special case of matrix multiplication. The same holds for matrix-vector multiplication that is defined for a \(n \times m\) matrix \(\boldsymbol{W}\) and a vector as

Matrices with the same number of rows and columns, i.e., with type \(n\times n\) for some , are said to be square matrices.

2.2 Affine functions

Definition 1

Affine Function

A function is called affine iff it can be written as

$$ \alpha (\vec{x})= \boldsymbol{W} \vec{x} + \vec{b} $$

for some matrix and vector . We identify the semantics and syntax of affine functions with the pair \((\boldsymbol{W}, \vec{b})\) which can be considered as a canonical representation of affine functions. Furthermore, we denote the set of all affine functions as with type \((n,m)\). The untyped version is meant to refer to the set of all affine functions, independently of their type.

Lemma 1

Operations on Affine Functions

Let \(\alpha _{1},\alpha _{2}\) be two affine functions in canonical form, i.e.,

$$\begin{aligned} \alpha _{1}(\vec{x}) &= \boldsymbol{W_{1}} \vec{x} + \vec{b_{1}} \\ \alpha _{2}(\vec{x}) &= \boldsymbol{W_{2}} \vec{x} + \vec{b_{2}} \end{aligned}$$

Assuming matching types, the operations + (addition), ⋅ (scalar multiplication), and ∘ (function application) can be calculated on the representation as

$$\begin{aligned} (s \cdot \alpha _{1})(\vec{x}) &= (s \cdot \boldsymbol{W_{1}}) \,\vec{x} + (s \cdot \vec {b_{1}}) \\ (\alpha _{1} + \alpha _{2})(\vec{x}) &= (\boldsymbol{W_{1}}+\boldsymbol{W_{2}}) \, \vec{x} + (\vec {b_{1}} + \vec {b_{2}}) \\ (\alpha _{2} \circ \alpha _{1})(\vec{x}) &= (\boldsymbol{W_{2}} \boldsymbol{W_{1}}) \, \vec{x} + (\boldsymbol{W_{2}} \vec {b_{1}} + \vec {b_{2}}) \end{aligned}$$

resulting again in an affine function in canonical representation.

It is well-known that the type resulting from function composition evolves as follows

The type of the operation is important for the closure axiom, the basis for most algebraic structures. This leads to the following well-known theorem [5]:

Theorem 1

Algebraic Properties

Denoting, as usual, scalar multiplication withand function composition with ∘, we have:

  • is a vector space and

  • is a monoid.

This theorem can straightforwardly be lifted to untyped by simply restricting all operations to the cases where they are well-typed, i.e., where addition is restricted to functions of the same type (\(+_{t}\)), and function composition to situation where the output type of the first function matches the input type of the second (\(\circ _{t}\)):

Theorem 2

Properties of Typed Operations

is a typed algebra, i.e., an algebraic structure that is closed under well-typed operations.

2.3 Piece-wise affine functions

Piece-wise affine functions (PAFs) are studied extensively in tropical geometry (introductory book [36], in context of machine learning [38]), are used in interpolation (given their strong connection to splines and Riemann integrals), and are more and more analyzed with respect to their connection to neural networks [4, 13, 19, 2830, 39, 43, 44, 46, 48, 5557] (specifically to PLNNs, see Definition 7). PAFs are usually defined over a polyhedral partitioning of the pre-image space [9, 22, 41]. Polyhedra arise by intersecting halfspaces:

Definition 2

Hyperplanes and Halfspace

Let and . Then the set

is called a hyperplane of . A hyperplane partitions into two convex subspaces, called halfspaces. The positive and negative halfspaces of \(p \), respectively, are defined as

Definition 3

Polyhedron

A polyhedron is the intersection of \(k\) halfspaces for some natural number \(k\).

Definition 4

Piece-wise Affine Function

A function is called piece-wise affine if it can be written as

$$ \psi (\vec{x})= \alpha _{i}(\vec{x}) \; \text{ for } \; \vec{x} \in Q_{i} $$

where \(Q = \mathopen{\{}\, Q_{1},\dots ,Q_{k} \, \mathclose{\}}\) is a set of polyhedra that partitions the space and \(\alpha _{1}, \dots , \alpha _{k}\) are affine functions. We call \(\alpha _{i}=\boldsymbol{W}_{i} \vec{x} +\vec{b}_{i}\) with \(1\leq i\leq k\) the function associated with polyhedron \(Q_{i}\).

2.3.1 Norms and distances

Throughout this work, we will often be concerned with the behavior of neural networks and how it changes when a point is slightly altered. Thus, we will often be concerned with different neighborhoods of points. These are formalized in mathematics using metric spaces and normed spaces [37]. For our purposes, however, a special type of normed spaces defined by so-called \(l\)-norms is sufficient:

Definition 5

L-Norms

For , the \(l_{m}\)-norm is the function defined by:

Important \(l\)-norms are the \(l_{1}\) norm

and the \(l_{2}\) norm or euclidean normFootnote 1

Another important norm is the so-called \(l_{\infty}\) norm. While not technically an \(l\)-norm according to the definition above, it arises naturally as the limit of the \(l\)-norms as \(m\) approaches infinity and is defined by:

With these definitions we can now formalize the neighborhood of a point.

Definition 6

Unit Ball

For a given \(l_{m}\)-norm \(\mathopen{\lVert }\,\cdot\,\mathclose{\rVert }_{m}\) we define the corresponding \(m\)-unit ball as

(1)

The unit ball is a closed, convex subset of centered at the origin. It generalizes the notion of a disk with radius 1 to both higher dimensions (\(n > 2\)) and non-euclidean spaces (\(m \neq 2\)). The relevant unit balls for this paper are illustrated in Fig. 1. Every generic \(m\)-ball of can be constructed using translation

and scaling

of unit \(m\)-balls. In the latter case \(r\) is called the radius. If it is clear from the context, we omit the dimensionality of the \(m\)-ball.

Fig. 1
figure 1

The two-dimensional \(m\)-balls \(B_{m}^{2}\) for \(m=1\) (top left), \(m=2\) (top right) and \(m=\infty \) (bottom)

2.4 Neural networks

The following brief introduction to neural networks is based on [20], but in its presentation adapted to better fit the context of this work.

Neural networks are perhaps todays most important machine learning models that are most succinctly characterized by their layered structure. There exist numerous neural network architectures that one might consider. For this work, we focus on the very general class of fully connected neural networks and define neural networks as follows:

Definition 7

Piece-wise Linear Neural Networks

A piece-wise linear neural network \(\nu \) with \(l\) layers is a machine learning model consisting of an alternating sequence of \(l\) affine preactivation functions \(\alpha _{i}\) and \(l-1\) ReLU activation functions \(\phi \):

$$ \nu = \alpha _{l+1} \mathbin {;}\phi \mathbin {;}\alpha _{l} \mathbin {;}\dots \mathbin {;}\phi \mathbin {;}\alpha _{1} $$

For the PLNN to be syntactically correct, the affine functions must be compatible, i.e., the output dimension of each preactivation must match the input dimension of the following.

In accordance to standard neural network terminology, we call the combination of a preactivation with its activation \(\phi \circ \alpha _{i}\) the \(i\)-th layer of the neural network.

In the following paragraphs preactivations and activations are properly introduced. After that, the semantics of a PLNN can be defined in terms of its components. Lastly, a common complexity measure of PLNN is presented.

Preactivations

In traditional applications, the concrete affine functions \(\alpha _{k}\) of a PLNN \(\nu \), as defined in Definition 7, would result from a training process, usually using gradient descent based optimization techniques [20], where the PLNN is trained to accurately predict desired outputs on a given dataset.

Activations

The activation function \(\phi \) is an architectural design choice made a-priori by the user. The primary purpose of the activation function is to introduce non-linearity into the neural network, which can drastically increase the amount of functions that can be approximated. For the purposes of this paper, we exclusively use the rectified linear unit (ReLU) function.

ReLU

The ReLU function \(\phi \) (c.f., Fig. 2) has proven to be a successful activation function in practical applications, combining convenient properties of linear functions with a sufficient degree of non-linearity. It is prominently recommended as the default choice of activation function for fully connected neural networks [20]. Furthermore, due to the simple structure of the ReLU function, neural networks with ReLU activations lend themselves well to formal analysis and are typically considered in verification tasks [6, 31].

Fig. 2
figure 2

The function plot of the ReLU function

Definition 8

Rectified Linear Unit

The one-dimensional ReLU function is defined as the positive part of its argument:

$$ \phi _{1}(x)= \max (0,x) $$

The \(k\)-dimensional ReLU function is the elementwise application of the one-dimensional ReLU function:

$$ \phi _{k} \big( (x_{1}, x_{2},\dots ,x_{k})^{\top }\big) = \big( \phi _{1}(x_{1}), \phi _{1}(x_{2}),\dots , \phi _{1}(x_{k}) \big)^{ \top } $$

As the ReLU activation function is the only activation function we consider, we will use \(\phi \) for the remainder of this paper exclusively to refer to the ReLU function and omit the explicit mention of the dimensionality when it is clear from context.

Definition 9

PLNN Semantics

The semantics of a piece-wise linear neural network \(\nu \) is a piece-wise affine function given by the sequential evaluation of its layers:

For evaluation, a vector is passed layer by layer through the PLNN. The data-flow is unidirectional and using the above notation from right to left.

Note that, given the close relationship between a PLNN’s syntax and semantics, many works in deep learning choose to not clearly separate the syntax and semantics of PLNN’s. A transition between the two definitions can be easily achieved by replacing ‘;’ with ‘∘’ in Definition 7.

Traditionally, neural networks are visualized as computation graphs, where the nodes are the eponymous “neurons”. There, each affine function is visualized as a bipartite graph connecting \(n\) input neurons to \(m\) output neurons. An example of such graph is given in Fig. 3.

Fig. 3
figure 3

A simple PLNN with two hidden layers and ReLU activations

From the representation used in Definition 7, the number of neurons of a neural network can be computed through the preactivations as follows: Let \(\nu = \alpha _{l+1} \mathbin {;}\dots \mathbin {;}\alpha _{1}\) with then the total number of neurons of \(\nu \) is given by

$$ \sum _{i=1}^{l+1} n_{i+1} $$

The number of neurons is a natural measure of “size” in a neural network, and it is well known that the semantic complexity of functions – measured in the number of linear regions that are needed to characterize them – that a neural network can represent increases exponentially in its number of neurons [7, 18, 39].

2.4.1 Neural networks classifiers

As defined in Definition 7, PLNNs are fundamentally representations of continuous functions . However, they are frequently employed in classification tasks where the co-domain is instead a discrete set of classes \(\{1,\dots , c\}\). To bridge this gap, one typically proceeds by training a neural network and associating each component \(y_{i}\) of its output \(\vec{y}=\nu (\vec{x})\) with the \(i\)-th class. Then, the class with the largest \(y_{i}\) is chosen for classification.

This is formalized by the argmax function.

Definition 10

Argmax

The \(k\)-dimensional argmax function

is defined as

$$ \operatorname*{arg\,max}(x_{1},\dots , x_{k}) = j $$

iff \(j\) is the smallest index for which \(x_{j} \geq x_{i}\) holds for all \(1 \leq i \leq k\).

Again, when it is clear from context, we omit the index denoting the dimensionality and simply write \(\operatorname*{arg\,max}\).

As described before, the argmax function can be used to convert PLNNs into classifiers. This naturally leads us to define PLNN classifiers.

Definition 11

PLNN Classifiers

For a PLNN \(\nu \) with , the corresponding PLNN classifier is defined as

2.5 Typed affine decision structures

Central to our explanation approach is a decision-tree-like data structure that we call Typed Affine Decision Structure (TADS). Based on the transformation process presented in [45], it is possible to transform a PLNN \(\nu \) into a semantically equivalent TADS \(\theta (\nu )\). The transformation is based on the common syntactical representation of PLNNs and is compositional in the layers.

PLNNs explanation and verification is challenging because of the complex data flow of PLNNs [45].

Data structure

Skipping implementation details, TADS can be introduced intuitively using decision trees. In a decision tree, one distinguishes two types of nodes:

  1. 1.

    Inner nodes have decision predicates. For every possible evaluation of that predicate, the node has exactly one successor.

  2. 2.

    Leaves are elements from a given universe that one wants to distinguish.

For TADS, specifically, leaves are from the universe of affine functions and decision predicates are affine inequalities.Footnote 2 An example of a TADS can be found in Fig. 4. TADS structurally resemble decision trees that are widely considered explainable machine learning models, i.e., they can, by virtue of their structure, be understood by a human [26].

Fig. 4
figure 4

A simple TADS, implementing the piece-wise affine function \(\mathopen{\lvert }x_{1} - x_{2}\mathclose{\rvert }\)

Based on this introduction using decision trees one can straightforwardly define TADS.

Definition 12

TADS

A TADS \(t = (N, \rightarrow , \zeta )\) is a decision DAGFootnote 3\((N, \rightarrow , \zeta )\) with root \(\zeta \) whose nodes \(N\) have the following two types:

  1. 1.

    Inner nodes are called decisions or predicates. They consist of an affine inequality and two successors, one if the predicate is true and one if not.

  2. 2.

    Leaves are also called terminals. They are affine functions and have no successors.

To be syntactically correct, all nodes (i.e., all inequalities and affine functions) must accept input vectors with a fixed number of entries. This is called the input dimension of the TADS. Similarly, all terminals must map input vectors into a common output space. The dimensionality of this output space is called the output dimension.Footnote 4 For given input dimension \(n\) and output dimension \(m\) we define the set of all TADS as .

TADS are sequentially evaluated like a decision tree.

Definition 13

TADS Evaluation

The semantic function of TADSFootnote 5

is inductively defined as

for a TADS \(t = (N, \to , \zeta )\), with \(p,p',\alpha \in \nu \). For convenience we introduce the shorthand .

Semantically, both PLNNs and TADS represent piecewise affine functions. Moreover, PLNNs can be transformed into TADS:

Lemma 2

Trinity: PLNNs, TADS, and PAFs

There exists a semantics preserving transformation

from PLNNs to TADS, such that the following diagram commutes:

figure e

Algebraic properties

Much like ADDs and BDDs, TADS inherit the algebraic properties of their leaf algebra. For TADS, the leave algebra—affine functions—forms a vector space. Using lifting one can directly implement the vector space operations on TADS [45].

Lemma 3

Lifting

Lifting addition (+) and scalar multiplication (⋅) from affine functions to TADS gives semantically equivalent operators to their PAF counterparts, i.e., for all TADS

By the lifting theorem of [45] the algebraic properties are preserved and thus:

Theorem 3

TADS vector space

TADS form a vector space.

It is well known that piece-wise affine functions are closed under composition. Even though this operator can not be directly lifted, it can be easily implemented on TADS [45].

Theorem 4

TADS Composition

TADS composition

is defined such that for all , :

It follows straightforwardly that:

Theorem 5

TADS Monoid

TADS forms a monoid.

The composition operator ⋈ is especially important in the context of neural networks, as neural networks are inherently compositions of piece-wise affine functions.

2.6 Principal component analysis

Principal Component Analysis (PCA) is one of the most popular techniques for dimensionality reduction and feature extraction [1, 8, 54]. At a high level, it seeks to find, for a given dataset , a linear subspace with dimension \(\dim (V)\ll n\) that can be used to encode \(D\) with as little reconstruction loss as possible.

Such an encoding is useful for machine learning algorithms as it can drastically reduce the input dimension. Large input dimensions can be very problematic in machine learning and entail numerous potential problems, altogether known as the curse of dimensionality [50].

The fundamental objects of PCA are the eponymous principal components that are defined as follows:

Definition 14

Principal Components

For a given dataset with \(j \geq n\) and zero mean \(\sum _{\vec{d} \in D} d = \vec{0}\), there exist \(n\) principal components which are characterized as iterative solutions to the following optimization problem: The \(i\)-th principal component \(\vec{p_{i}}\) maximizes the variance of the data when it is projected onto \(\vec{p_{i}}\):

$$ \sum _{d \in D} \mathopen{\langle }\vec{p_{i}},\vec{d} \mathclose{\rangle }^{2} \rightarrow \max $$

under the constraint that \(p_{i}\) has unit length

$$ \mathopen{\lVert }\vec{p}_{i}\mathclose{\rVert }_{2} =1 $$

and is orthogonal to all previous principal components

$$ \forall h < i : \mathopen{\langle }\vec{p_{i}},\vec{p_{h}}\mathclose{\rangle } = 0 $$

Note that every dataset \(D\) with non-zero mean, i.e., \(\sum _{\vec{d} \in D} \frac{\vec{d}}{|D|} = \vec{\mu}\) with \(\vec{\mu}\neq 0\), can be made to obey the restriction \(\vec{\mu}=\vec{0}\) by performing the following transformation on each datapoint: \(\vec{d}_{i}' = \vec{d}_{i} - \vec{\mu}\). By definition, the principal components are pair-wise orthogonal and normed and therefore linearly independent. Thus, they form a basis of . It follows that there is a natural, unique representation based on the principal components \(\rho (\vec{x})=(r_{1},\dots , r_{k})^{\top }\) such that:

$$ \vec{x} = \sum _{i=1}^{n} r_{i} \vec{p}_{i} $$

In particular, in the case of PCA, the \(r_{i}\) can be computed as

$$ r_{i} = \mathopen{\langle }\vec{p}_{i},\vec{x} \mathclose{\rangle } \ . $$

With this, PCA can naturally be used as a dimensionality reduction tool.

Definition 15

PCA Dimensionality Reduction

Let \(0< k< n\). For some with \(\rho (\vec{x})=(r_{1},\dots , r_{n})\), the \(k\)-dimensional PCA representation is given by cutting off the PCA representation after the \(k\)-th element:

$$ \rho _{k}(\vec{x})=(r_{1},\dots , r_{k})^{\top } $$

Consequently, the \(k\)-dimensional PCA reconstruction to \(\vec{x}\) is given as:

$$ \vec{x} \approx \theta _{k}(\rho _{k}(\vec{x}))= \sum _{i=1}^{k} r_{i} \vec{p}_{i} $$

As \(\rho _{k}\) is a projection for \(k < n\), it loses information. Therefore, the PCA reconstruction after dimensionality reduction is approximative, as visualized in Fig. 6.

In essence, the composition of PCA encoding and reconstruction forms a function that is close to the identity function on the dataset and its surrounding points while reducing the number of dimensions needed to express the data. The success of PCA is heavily dependent on the dataset being mainly distributed along a linear subspace of and its generalization performance requires that new data follow the same distribution as the training data. However, if these assumptions hold, it is a very good approximation, as indicated by the following defining property of the principal components:

Lemma 4

The principal components are exactly those vectors that make the reconstruction error minimal over \(D\) among all linear, orthogonal encoders and decoders using \(k\) dimensions [1]. I.e., for all orthogonal, linear functions , , the term

$$ \sum _{\vec{x} \in D} \bigl\lVert \vec{x} - d(e(\vec{x}))\bigr\rVert _{2}^{2} $$

is minimal if \(e = \rho _{k}\), \(d=\theta _{k}\).

PCA is attractive for multiple reasons. First, PCA representations and approximations are linear functions which makes them easy to work with. Second, PCA supports reductions to \(k\) for any \(0< k\leq n\), which makes PCA very flexible. Lastly, but perhaps most importantly, PCA is a well-understood and well-proven method in practice and can elegantly enable strong performance in even relatively simple machine learning models. An example of a PCA encoding and reconstruction is shown in Figs. 5a to 5c.

Fig. 5
figure 5

Example for PCA dimensionality reduction. A set of points \(X\) (shown in blue) follows a multinormal distribution that scatters more along one axis. This axis is very closely resembled by the first principal component (shown in green). Through orthogonal projection one can reduce the dimensionality (b), and the reconstruction (c) is very close to the original (Color figure online)

Fig. 6
figure 6

Reconstruction of 784 pixel MNIST digits with various number of principal components, showcasing how a PCA reconstruction can faithfully reconstruct an original input based on very little information (\(k=16\) principal components vs. 784 pixel). Original MNIST digits in first row, following rows show reconstruction with \(j=2, 8, 16\) principal components. Vectors are visualized using a perceptually uniform diverging color palette (Seaborn’s “icefire”)

Fig. 7
figure 7

Illustration of two robustness scenarios using its geometric interpretation. In (a) robustness is achieved while in (b) robustness is violated as the \(\infty \)-ball around \(\vec{x}\) intersects with the decision boundary (Color figure online)

3 Problem setting: robustness on MNIST

3.1 Introduction to MNIST

In the remainder of the paper we consider the problem of digit recognition using the MNIST dataset [16]. The MNIST dataset provides a traditional baseline-problem scenario for machine learning. While simpler than modern, large-scale machine learning tasks, MNIST requires PLNNs of relevant size for satisfactory classification and stands to this day as an introductory problem in verification benchmarks [6].

The MNIST dataset consists of 70.000 gray-scale images of hand-written digits, each labeled with the digit they represent to a human observer. The dataset is split into 60.000 examples for training and 10.000 examples for testing. Images consist of \(28\times 28\) pixels and are represented as vectors with each component \(\vec{x}_{i}\) representing the gray-scale value of the \(i\)-th pixel on a scale from 0 to 1. Thus, each sample has the form

$$ (\vec{x}, l) $$

with \(\vec{x} \in [0,1]^{28\cdot 28}\) and \(l \in \{0,\ldots,9\}\). The task is to find a PLNN classifier that represents a function

assigning to each image the digit it is supposed to represent. At a baseline, should classify most training examples correctly and should perform acceptably well on the test data.

A challenge for classification problems like this is to control so-called adversarial examples, as discussed in the following.

3.2 Robustness to adversarial examples

In essence, robustness is the absence of adversarial examples, which are perhaps the most well-known manifestations of chaotic behavior of neural networks and have received wide attention in research [21, 33, 49]. We work with the following definition of adversarial examples:

Definition 16

Adversarial Example

Let be a PLNN classifier. Further, let be a given point of interest that is correctly classified by . Then, is an \(\epsilon \)-adversarial example to \(\vec{x}\) iff

If admits no \(\epsilon \)-adversarial examples for a given input \(\vec{x}\), then it is called \(\epsilon \)-robust around \(\vec{x}\).

Intuitively, an adversarial example is a slight perturbation of an input that, although minor, changes the neural networks prediction. Note that in image recognition problems such as MNIST, the restriction \(\lvert \! \lvert {\vec{y} - \vec{x}} \rvert \!\rvert _{\infty } \leq \epsilon \) encodes that between \(\vec{x}\) and \(\vec{y}\), each pixel can only differ by at most \(\epsilon \).

In practice, adversarial examples can be almost imperceptible to a human [35, 49] while arbitrarily altering previously correct decisions, sometimes yielding outlandish classification results, which may enable outside attacks on neural network systems. Thus, it is critical that neural networks cannot be adversarially attacked at points where the desired semantics is clear.Footnote 6

3.3 Verifying robustness

Generally, PLNN verification is the task proving a property for the result of a PLNN where the input is restricted to a given domain [3, 11]. Formally, let be a PLNN, a restriction of the input domain, and a predicate. Then PLNN verification is the task of proving or refuting with a counterexample the formula

(2)

For the case of verifying \(\epsilon \)-robustness around \(\vec{x}\) for , we can formulate (2) specifically as (cf. Definition 16)

Corresponding state-of-the-art verification tools use different methods like [11]:

  • Satisfiability Modulo Theories

  • Mixed Integer Programming

  • Branch and Bound

For more information, see Sect. 7 on related work.

4 Extending TADS to cover robustness properties

TADS are characterized by:

  1. 1.

    Global explanations, i.e., they explain the behavior of a PLNN over the entire space of possible inputs. Robustness properties however concern only the relatively small neighborhood \(x + \epsilon B_{\infty}\) of a point \(\vec{x}\).

  2. 2.

    Regression behavior, they represent a continuous function. With respect to robustness, we are however interested in the behavior of the associated PLNN classifier.

The following two subsections will show that TADS are nevertheless well suited to deal with robustness properties.

4.1 Precondition projection on TADS

When studying adversarial examples, one may use the strict preconditions (given as infinity balls \(\epsilon B_{\infty}\)) to reduce the work load. Given the strong connection between affine functions and (convex) polytopes, it is a straightforward procedure to apply polyhedral preconditions—such as infinity balls as particularly required for robustness properties—on TADS. Please note that stronger preconditions result in less work.

Given a TADS \(t\) representing a piece-wise affine function we are interested in the behavior of \(f\) on a given (small) polyhedron . In other words, we are interested in the function which is given by:

Technically, this is implemented by encoding the polytope \(S\) as a TADS using affine inequalities:

By explicitly eliminating paths that lead to \(\bot \) (see [45]), the resulting TADS is significantly reduced in size.

4.2 Argmax on TADS for classification

Neural networks are frequently used for classification, as outlined in Sect. 2.4.1. As described there, the neural network classifier associated with a given neural network \(\nu \) can naturally be modeled as

$$ \nu _{c} = {\operatorname*{arg\,max}} \circ {\mathopen{[\!\![}\nu \mathclose{]\!\!]}} $$

Interpreting a neural networks behavior in this way drastically changes its nature, and if one seeks to analyze a neural network that is meant to be used as a neural network classifier, it is important that one analyzes it with respect to its classification behavior.

We know how to construct a \(t_{\nu}\) for any PLNN \(\nu \). On the other hand, it is also easy to see how a TADS \(t_{a}\) can be constructed for \({\operatorname*{arg\,max}}\): Intuitively, such a TADS need only to perform a linear search for the maximum of \({\vec{x}=(x_{1},\dots ,x_{n})}\) from \(x_{1}\) to \(x_{n}\). Figure 8 illustrates this for the three-dimensional argmax in Footnote 7 This TADS first compares \(x_{1}\) and \(x_{2}\) in the first layer, then compares their maximum with \(x_{3}\) to attain the result. The extension to higher dimensions is straightforward.

Fig. 8
figure 8

The TADS \(t_{a}\), representing the argmax function with 3 variables

Taken together, it is straightforward to construct the classification TADS \(t_{\nu _{c}}\) using TADS composition as follows:

$$ t_{\nu _{c}} = t_{\nu} \Join t_{a} $$

The semantical correctness of this construction follows directly from the correctness of the TADS composition, i.e.:

5 Verifying robustness on MNIST using TADS

The following subsections of this Section present four approaches to robustness verification via TADS and illustrate them using the MNIST data set:

  • A straightforward approach where the considered PLNN is directly transformed into a TADS (cf. Fig. 9a). This approach typically does not scale due to the typical exponential explosion of the TADS transformation.

    Fig. 9
    figure 9

    Overview of the different approaches to verifying robustness with PCA encoding. Legend: (green) entity for usage in the real world, (blue) components only used during verification, (orange sidebar) components used for TADS construction, (mint green) components used during training and actively trained, (pale green) parts included in training but not actively learned, (olive green) parts that are not included in the training process (Color figure online)

  • An approximative approach based on PCA-based dimensionality reduction that scales, provides a good heuristics to search for adversarial examples, but is insufficient to prove robustness (cf. Fig. 9b). In this case, the TADS-based analysis only covers the subspace that can be ‘reached’ from the initial, low-dimensional PCA-based vectors space via decoding and adequate basis transformation, as indicated by the blue part. Thus, this approach cannot guarantee that the analysis of the TADS is sufficient to reveal all adversarial examples of the original PLNN.

  • A transformational approach based on PCA-based dimensionality reduction, where the PLNN is extended by a preprocessing step, defined by PCA-based auto-encoding, i.e., the composition of a PCA-based dimensionality reduction followed by a linear function that embeds (decodes) the low-dimensional space into the original space (cf., Fig. 9c). Here we can show that analyses of the partial extension that start with the decoding are sufficient to obtain robustness results for the extended PLNN that is defined for the 784-dimensional space of MNIST.

  • A modification of the third approach, where the linear function defined by the composition of the decoder and the initialization layer of the original net is replaced by a linear layer to provide a network architecture with the same number of layers but with a strongly reduced input dimension (cf. Fig. 9d). The PLNN considered for verification is now given as the result of a learning process using the same sample set as in the other cases, but starting with a PCA-based reduction step. Technically, the subsequent TADS-based robustness analysis proceeds exactly in the same way as before guaranteeing that the robustness result proven for the dark green part can again be lifted to the overall net.

We will show that the third and fourth approaches allow us to prove full robustness in a computationally efficient manner. However, they come at the price of modifying the PLNN.

In our eyes, this is no disadvantage as long as the modified PLNN is still sufficiently accurate; Neural networks are themselves only results of a heuristic training process and have no intrinsic merit beyond their predictive accuracy. In fact, the results shown in Fig. 12 indicate that predictive accuracy can still be achieved after a significant reduction in dimensionality, drastically easing formal verification.

5.1 Full verification with TADS

At their baseline, TADS are so called model explanations [25] of PLNNs, i.e., for any classification PLNN , a corresponding TADS can be generated that represents the same function as \(\nu _{c}\) in an easily comprehensible and analyzable manner. Of course, the global behavior of neural networks is usually too large to be represented with a TADS. However, in the case of robustness verification, we are only interested in the behavior of \(\nu _{c}\) in the neighborhood around some point of interest \(\vec{x}\), formalized by an infinity ball (see Definition 6). Recall from Definition 16 that \(\epsilon \)-robustness for a point \(\vec{x}\) is formalized by the property

$$ \forall \vec{y} : \mathopen{\lVert }\vec{y} - \vec{x} \mathclose{\rVert }_{\infty }\leq \epsilon \implies \nu _{c}(\vec{x}) = \nu _{c}(\vec{y}) $$

Equivalently, this problem can also be stated as

$$ \mathopen{\lvert }\nu _{c}( \vec{x} + \epsilon B_{\infty}^{n})\mathclose{\rvert } = 1 $$

that is, the neighborhood of \(\vec{x}\) defined by the infinity ball \(\epsilon B_{\infty}\) of dimension \(n\) with radius \(\epsilon \) is classified consistently as one class. This property can be verified using the following theorem:

Theorem 6

TADS Verification

Let \(\nu _{c}\) be a PLNN classifier, \(\vec{x}\) a point of interest, and \(t\) a TADS satisfying . Then \(\nu _{c}\) is \(\epsilon \)-robust around \(\vec{x}\) iff

contains only feasible paths to the class \(\nu _{c}(\vec{x})\).

The correctness of this theorem follows directly from the correctness results regarding TADS that were established in Sect. 4.

The approach to directly verify the original network sketched at the very left of Fig. 9 only works for quite small MNIST networks. Core reason for this scaling problem is the dimensionality of MNIST: With 784-dimensional inputs, the volume of the \(\epsilon \)-ball around \(\vec{x}\) is proportional to \(\epsilon ^{784}\), which grows quite quickly leading to intractably large TADS.

5.2 PCA guided validation

To improve scalability of TADS-based verification, one might consider approximative robustness instead. More concretely, instead of searching for adversarial examples in the full ball \(\epsilon B_{\infty}(\vec{x})\), we will present an approach that restricts the search to a lower dimensional subset \(S \subset \epsilon B_{\infty}(\vec{x})\).

This will yield an underapproximation to robustness: If an adversarial example is found in \(S\), it also exists in \(B_{\infty}(\vec{x})\) and robustness is violated. However, the absence of adversarial examples in \(S\) does not imply the absence of adversarial examples in \(B_{\infty}(\vec{x})\). Key for the construction of the lower-dimensional manifold \(S\) is principal component analysis (PCA) as introduced in Sect. 2.6.

Applying PCA to MNIST results in a list of \(n=784\) principal components

$$ \vec{p}_{1}, \dots , \vec{p}_{n} $$

ordered by decreasing variance along the respective axis.

The first six of which are visualized in Fig. 10. Recall from Definition 15, that the principal components of \(\vec{p}_{i}\) are precisely those along which a given dataset scatters most. They are therefore natural candidates to explore in a heuristic search for adversarial examples.

Fig. 10
figure 10

The first 6 principal components of the MNIST dataset. Note that in the context of MNIST, images are just 784-dimensional vectors, and we therefore represent the PCA vectors as images. Vectors are visualized using a perceptually uniform diverging color palette (Seaborn’s “icefire”). Negative values are shown in blue, positives in red. Higher values are expressed with higher color intensity

Let \(\vec{x}\) be some point for which we seek to find adversarial examples. Then, we can define the \(k\)-dimensional PCA space around \(\vec{x}\) as follows:

This space contains all vectors that are reachable from \(\vec{x}\) along the principal components or, equivalently, the image of the PCA decoding function \(\theta _{k}\).

This allows us to define a search space for adversarial examples:

(3)

Observe that \(S\) is by definition a subset of \(B_{\infty}^{n}\) of dimensionality \(k \ll n\) and that, as both \(U_{k}\) and \(\epsilon B_{\infty}^{n}\) are defined by linear equations, it can be conveniently expressed as a TADS precondition.

Restricting the search for adversarial examples to \(S\) via a PCA-based transformations that adequately decodes the vectors of some k-dimensional ball \(\delta B_{\infty}^{k}\), as sketched in Fig. 9b, drastically reduces the computational load.Footnote 8 However, this reduction comes at a price, which is usually quite high (cf., Fig. 11): Independently of the choice of \(\delta \), it can never prove the absence of adversarial examples.

Fig. 11
figure 11

The same general robustness scenario that is depicted in Fig. 7a, except that now only approximative robustness is considered

5.3 Built-in PCA verification

Fundamentally, neural networks are heuristical models that seek only to achieve high performance, which is typically defined as the accuracy of their predictions. If a neural network is only as useful as its predictive accuracy, then any change that is made to the neural network that does not drastically alter its predictive accuracy is acceptable. This opens up a new angle to neural network verification that is unlike traditional program verification: Rather than trying to verify a given network as it is, one may well alter the network as long as this does not impair the prediction quality too much. In fact, we consider such a step (often) necessary, as classifiers defined by high-dimensional neural networks will often not be robust, but small alterations may well be.

Figure 9c sketches how the idea of PCA can be used to achieve such an alteration: The point is that each input is channelled through the low-dimensional PCA space, which, similar to the situation in the previous section, is simple enough to support verification. However we will see that, in contrast to the previous section, the special character of PCA encoding allows us to infer robustness result from the robustness results for the PCA space. More concretely, after verifying the robustness of

$$ \nu _{c} \circ \theta _{k} $$

we establish a robustness result for the full modified net

The success of this method very much depends on the accuracy of \(\nu _{r}\), which itself strongly depends on the chosen \(k\). We will discuss this issue in the Sect. 5.4.

In the remainder of this section, we show how to infer robustness result for the full \(n\)-dimensional vector space from robustness results for \(\nu _{c} \circ \theta _{k} \). Key observation to prove this property is that PCA preserves neighborhoods:

Lemma 5

PCA Preserves Neighborhoods

Let \(\rho _{k}\) be the PCA transformation of the first \(k\) principal components. For an input \(\vec{x}\) and an \(\epsilon \)-neighbor \(\vec{y}\) with \({\mathopen{\lVert }\vec{y} - \vec{x} \mathclose{\rVert }_{\infty }\leq \epsilon}\) one can estimate their distance in the image of \(\rho _{k}\) as

$$ \mathopen{\lVert }\rho _{k}(\vec{y}) - \rho _{k}(\vec{x})\mathclose{\rVert }_{\infty }\leq \epsilon \max \nolimits _{i} \mathopen{\lVert }\vec{p}_{i}\mathclose{\rVert }_{1} $$

Proof

= ρ k ( y ) ρ k ( x ) = ρ k ( y x ) linearity = max i = 1 k | p i , y x | def.  = max i = 1 k | j = 1 k ( p i ) j ( y j x j ) | def  , max i = 1 k j = 1 k | ( p i ) j | | y j x j | for  | | ϵ max i = 1 k j = 1 k | ( p i ) j | assumption = ϵ max i = 1 k p i 1 def.  1

One can see that the bound is tight by setting

$$ \vec{y} = \vec{x} + \epsilon \operatorname{sgn}(\vec{p}_{i}) $$

where \(\operatorname{sgn}(\vec{p}_{i})\) is the sign function applied component wise to \(\vec{p}_{i}\). For that \(\vec{y}\) equality holds for all steps. □

As all \(p_{i}\) have unit length, it is possible to derive an upper bound for \(\mathopen{\lVert }\vec{p}_{i}\mathclose{\rVert }_{1}\) for every PCA. It is obtained when at least one principal component \(\vec{p_{i}}\) (with some \(1 \leq i \leq k\)) equals

In that case, the norm is \(\mathopen{\lVert }\vec{p}_{i}\mathclose{\rVert }_{1} = \sqrt{n}\), leading to the following proposition.

Corollary 1

For every \(k \leq n\) and every set of principal components \(\vec{p_{1}}, \dots , \vec{p_{n}}\) the PCA representation \(\rho _{k}\) satisfies

$$ {\mathopen{\lVert }\vec{y} - \vec{x} \mathclose{\rVert }_{\infty }\leq \epsilon} \implies \mathopen{\lVert }\rho _{k}(\vec{x}) - \rho _{k}(\vec{y})\mathclose{\rVert }_{\infty }\leq \epsilon \sqrt{n} $$

This suffices to prove the announced robustness result:

Theorem 7

Robustness

Let by a PLNN. Then, let

If \(\nu '_{r}\) is \(\delta \)-robust around \(\rho _{k}(\vec{x})\) with

$$ \delta = \max \nolimits _{i} \mathopen{\lVert }\vec{p}_{i}\mathclose{\rVert }_{1} \leq \epsilon \sqrt{n} \ , $$

then \(\nu _{r}\) is \(\epsilon \)-robust around \(\vec{x}\).

Proof

For a proof by contraposition, we show that if \(\nu _{r}\) is not \(\epsilon \)-robust, then \(\nu '_{r}\) is not \(\delta \)-robust either. Let be an adversarial example for \(\nu _{r}\) with \(\mathopen{\lVert }\vec{z} - \vec{x} \mathclose{\rVert }_{\infty }\leq \epsilon \). By Lemma 5 it follows that \(\mathopen{\lVert }\rho _{k}(\vec{z}) - \rho (\vec{x})\mathclose{\rVert }_{\infty }\leq \delta \). Therefore \(\rho _{k}(\vec{z}) \in \rho _{k}(\vec{x}) + \delta B_{\infty}^{k}\). And since \(\vec{z}\) is an adversarial, it follows as desired that

$$ \nu '_{r}(\rho _{k}(\vec{z})) = \nu _{r}(\vec{z})\neq \nu _{r}(\vec{x}) = \nu '_{r}(\rho _{k}(\vec{x})) $$

 □

In other words, proving \(\nu '_{r}\)’s robustness on the \(k\)-dimensional PCA space with radius \(\delta \) directly proves robustness for the entire construct \(\nu _{r}\) with radius \(\epsilon = \frac{\delta}{\sqrt {n}}\). In the case of MNIST, \(n\) is equal to 784. Therefore, proving robustness of \(\nu '_{r}\) for some radius \(\delta \) implies robustness of \(\nu _{r}\) with radius at least

$$ \delta \geq \epsilon \geq \frac{\delta}{28} $$

5.4 Improving accuracy

As laid out in Sect. 5.3, PCA can be used to modify a neural network in a manner that makes it much easier to verify at the cost of some predictive accuracy. Fortunately, by modifying not only the neural network itself, but also its training process, some of that lost accuracy can be regained at almost no cost. Figure 9d sketches a way how both, verification can be eased and accuracy for low \(k\) can be improved. Key to this approach is the observation that in Fig. 9c, the PCA decoder and the first linear layer are adjacent and can therefore simply evaluate to a linear function with k-dimensional input and an output dimension defined by the first hidden layer. Thus, rather than just modifying the original classifier via PCA auto-encoding, one can (re-) learn the entire green part through the PCA encoder. This results in a much smaller trained network \(\nu _{t}\) which, in particular, is shielded from the 784 dimensions of MNIST by the PCA decoder. In fact, in our setup,

  • the number of neurons in \(\nu _{t}\) is essentially an order of magnitude smaller than the original net, and

  • the performance of \(\nu _{t} \circ \rho _{k} \) is much better for small \(k\), as shown in Fig. 13.

6 Experimental results

In the following, we will showcase experimental results regarding the TADS-based verification of neural networks using PCA to reduce the dimensionality of the verification problem. We will start by considering the reduction to two dimensions, allowing us to visualize the process and showcase its workings conceptually. Afterwards, we will move towards higher dimensions, examining more concrete questions of scalability.

6.1 Conceptual showcase and visualization

For this section, we consider the neural network classifier

where \(\nu '\) is a fully connected ReLU-network with 5 layers of 10 neurons each. Training is done on the MNIST training set with batches of 300 images per training step using standard settings of the ADAM optimizer [32]. This classifier uses the two-dimensional PCA representation. This allows us to plot the function represented by as done in Fig. 14.

We consider the sample \(\vec{x}_{9}\) shown in Fig. 15. This image is classified correctly by \(\nu _{c}\), being assigned the label “9”. However, as we will see, this classification is very unstable.

Using TADS, we can gain insight into this prediction by creating the class characterization TADS

$$ t^{9}_{\nu _{c}} = t_{\nu '} \Join t_{a} \Join t_{x=9} $$

for \(\nu _{c}'\) and class “9” on the infinity ball \(\vec{x}_{9} + 0.3 \cdot B^{2}_{\infty}\) This TADS is shown in Fig. 16 and can be interpreted as follows:

Moreover, we can visualize the function plot corresponding to this TADS as shown in Fig. 14. Note that lines in this plot indicate decision boundaries that are implied by the non-terminal nodes in the TADS. These decision boundaries separate the regions of the piece-wise affine function encoded by the neural network. As a consequence, each polygon that is enclosed by such linear boundaries corresponds to precisely one path in the TADS \(t^{9}_{\nu _{c}}\).

One can immediately observe that while \(\nu _{c}\) classifies \(\vec{x}_{9}\) correctly, there exists a close region of inputs that are classified incorrectly. Using the information contained in the TADS, it is trivial to obtain adversarial examples by picking any path in the TADS ending in the “0” terminal and finding a point satisfying the corresponding path condition. An example adversarial example generated in this way is shown in Fig. 15. Observe that, while being classified differently by \(\nu _{c}\), both images are almost identical to the human eye, which indicates that this neural network might not be entirely trustworthy even though it classified \(\vec{x}_{9}\) correctly.

6.2 Scaling to higher dimensions

After showcasing our verification approach conceptually on a 2-dimensional problem, we now move towards higher dimensions and seek to examine how the addition of new dimensions affects scalability. To do this, we construct a neural network classifier that uses a 6-dimensional input representation instead of a 2-dimensional one (all other settings are equal)

$$ \nu _{c} = \underbrace{{\operatorname*{arg\,max}} \circ { \mathopen{[\!\![}\nu '\mathclose{]\!\!]}}}_{ \text{classifier}} {}\circ \rho _{6} \ . $$

This increase of dimension drastically improves the accuracy, however, at the price of an explosion in size of the corresponding TADS. All reported numbers reflect an average according to six random runs.

Accuracy

The six-dimensional neural network classifier \(\nu _{c}\) achieved roughly 74% accuracy on the test set in comparison to the 91% accuracy of the original unrestricted network, but much better than the 46% accuracy of the two-dimensional classifier (cf., Fig. 13).

We also tested different dimensions for PCA with respect to network accuracy, the results of which can be found in Fig. 12. These results show that in this case, a dimensionality reduction by an order of magnitude still allows one to achieve 90% accuracy, which is very close to the 91% accuracy of the original network.

Fig. 12
figure 12

A plot showing the dependence of the neural network accuracy on the number \(k\) of input dimensions allowed in the input encoding. Multiple networks were trained with different initializations for 5 epochs. Error bars illustrate \(95\%\) confidence interval. Parameters: PyTorch framework with random seeds 0, 5, 10, 15, 20, 25, 42; network with 5 layers, 10 neurons per layer, ReLU activation, kaiming normal initialization, Adam optimizer, cross-entropy loss. PCA implementation of SciPy

Fig. 13
figure 13

Comparison of predictive accuracy on MNIST’s test set for the modified PLNNs of Sect. 5.3 and Sect. 5.4. The violet line (“variant 5.1”) shows the reference accuracy of an unmodified PLNN with same architecture and hyperparameters. All networks were trained as in Fig. 12, but only the accuracy after the 5-th epoch is shown

Fig. 14
figure 14

A function plot representing the TADS of \(\nu '_{c}\) in an area around \(\vec{x}_{9}\). The yellow area contains points that are classified correctly, the black area contains points that are classified incorrectly. The blue point represents \(\vec{x}_{9}\) and the red point represents the adversarial example \(\vec{x}_{5}\) (Color figure online)

Fig. 15
figure 15

MNIST sample that represents the number “9” (left) and a close adversarial example that is classified as “5” (right). The difference between the two is marginal (center). The adversarial was found in a neighboring linear region using a restricted TADS (cf., Fig. 16). Vectors are visualized using a perceptually uniform diverging color palette (Seaborn’s “icefire”). Idea of representation [21]

Fig. 16
figure 16

A TADS representing the behavior of \(\nu '\) around \(\vec{x}_{9}\). For readability, this TADS is constructed such that \(\vec{x}_{9}\) corresponds to the vector \((0,0)\). The terminal node “1” represents a correct classification, the node “0” an incorrect one

Scalability

In the two-dimensional case, we showed an example where robustness around some input could be accurately disproven with a radius of \(\delta =0.3\), which according to Lemma 5 implies

$$ \epsilon =\frac{\delta}{\max _{i} \mathopen{\lVert }p_{i}\mathclose{\rVert }_{1}} \approx \frac{0.3}{20}=0.015 $$

robustness of the 785-dimensional network.Footnote 9 The corresponding TADS, describing network behavior in the space of interest, had 51 nodes and could be handled quite easily. We repeat this experiment with the input image shown in Fig. 17 and the six-dimensional neural network instead. The TADS resulting from this experiment possesses roughly 4600 nodes. This is still manageable computationally, but indicates the expected explosion in size.

Fig. 17
figure 17

An MNIST sample image \(\vec{x}_{1}\) representing the digit “1”

7 Related work

The topic of robustness has been widely discussed in the machine learning community ever since it first gained attention in 2013 [49]. One topic of interest is research into heuristic methods that quickly and reliably find adversarial examples for modern neural networks, serving to understand how adversarial examples occur and therefore how they might be mitigated [12, 21]. As they are devised by the machine learning community, it is not surprising that these methods devised to find adversarial examples typically leverage methods from the machine learning toolbox, using gradient descent and other training heuristics to find adversarial examples.

Another natural topic with respect to robustness has been constructing neural networks that are reliably robust after training. A typical approach to this is defensive distillation [42]. Defensive distillation seeks to secure a previously trained neural network. This is achieved by using the outputs of the first neural network to train a second neural network with equivalent architecture. This process is called distilling. The additional information provided by the first neural network allows for efficient training in less training steps, reducing the need for large parameter values and therefore reducing the risk of adversarial attacks. Other approaches directly modify the training process to ensure scalability, usually by introducing additional regularization terms that are meant to steer the training process into a robust direction, often working in tandem with formal methods [27, 52, 58].

Closer to our approach are neural network verification approaches (for robustness). They can be split into two categories, approaches based on branch-and-bound tree search algorithms [15, 34] and approaches based on abstract interpretation [14].

Neural network verification—tree search

Much like SAT and SMT solvers, these approaches use a branch-and-bound tree search algorithm to find a counterexample to the property of interest. A critical part of this is finding an apt ReLU configuration, i.e., which neuron activation values need to be set to 0 by the ReLU activation function and which do not. This corresponds to finding a satisfiable path in a TADS that contains a counterexample, which makes TADS based verification inherently a representative of this category.

Other examples include Reluplex [31], one of the earliest scalable neural network verifiers, and alpha-beta-crown [53], a modern method that can be regarded as current state-of-the-art [6]. Methods of this type differ mostly in the heuristics that guide their branching and bounding.

Tree search methods are accurate and leading in practice, but they tend to be more time intensive than abstract-interpretation based methods. Moreover, they are, much like TADSs, naturally restricted to piece-wise affine neural networks and cannot cover activation functions such as sigmoid or softmax.

Neural network verification—abstract interpretation

These neural network verifiers define an abstract interpretation of neural networks to attain an overapproximation of the reachable states that a neural network can output on a given input region [17]. As these methods compute an overapproximation of the truly reachable states, they are safe, but not complete, i.e., they might incorrectly state that a given property is violated when it is not. On the flipside, abstract interpretation verifiers are typically computationally quite efficient and extend to neural networks that are not piece-wise affine. Examples of verifiers based on abstract interpretation include AI2 [17] and DeepPoly [47]. Our TADS-based approach naturally also applies to abstractly interpreted neural networks.

8 Conclusion

In this paper, we have applied TADS, a whitebox representation of neural networks, to the problem of neural network robustness. To apply TADS to this problem, we have introduced precondition projection and showed how to extend the argmax function, that is typically used with neural networks in classification tasks, to generate TADS that precisely describe a neural networks classification behavior in a given area around a fixed input point. Choosing the considered robustness region as precondition, robustness becomes equivalent to the property that the entire corresponding TADS collapses to one node that then characterizes the robust classification. If this is not the case, the resulting TADS explicitly represents the set of all adversarial examples.

This unique power of TADS-based robustness verification comes at the price of an exponential complexity, which we have proposed to mitigate via PCA-based dimensionality reduction by focussing the verification on the image of a low-dimensional PCA encoding. Three versions of this approach have been discussed:

  • An approximative version that can be regarded as an elaborate search heuristics for adversarial examples,

  • A transformational approach where the PLNN is extended by a preprocessing step defined by PCA-based auto-encoding, and which allows one to infer robustness of the 784-dimensional transformed network based on the analysis of the corresponding low-dimensional PCA space, and

  • An approach that is based on a modified learning process, specifically tailored to the corresponding PCA-based encoding. This method leverages the machine learning toolbox to improve the accuracy of the 784-dimensional transformed network while still allowing low-dimensional robustness verification.

We believe that dimensionality reduction, as illustrated in this paper for PCA, is key to achieve neural networks that are ready for verification. The challenge is to find dimensionality reduction techniques that maintain a high level of accuracy. In our experience, the success of such techniques hinges on characteristics of the application domain. We are optimistic that this approach will widen the scope of applications where neural networks are accepted.