The Power of Typed Affine Decision Structures: A Case Study

TADS are a novel, concise white-box representation of neural networks. In this paper, we apply TADS to the problem of neural network verification, using them to generate either proofs or concise error characterizations for desirable neural network properties. In a case study, we consider the robustness of neural networks to adversarial attacks, i.e., small changes to an input that drastically change a neural networks perception, and show that TADS can be used to provide precise diagnostics on how and where robustness errors a occur. We achieve these results by introducing Precondition Projection, a technique that yields a TADS describing network behavior precisely on a given subset of its input space, and combining it with PCA, a traditional, well-understood dimensionality reduction technique. We show that PCA is easily compatible with TADS. All analyses can be implemented in a straightforward fashion using the rich algebraic properties of TADS, demonstrating the utility of the TADS framework for neural network explainability and verification. While TADS do not yet scale as efficiently as state-of-the-art neural network verifiers, we show that, using PCA-based simplifications, they can still scale to mediumsized problems and yield concise explanations for potential errors that can be used for other purposes such as debugging a network or generating new training samples.


Introduction
In recent years, neural networks have been a driving force behind many of the most exciting success stories in machine learning. From image recognition [GWK + 18] and speech recognition [BMR + 20] to playing complex games on a superhuman level [VBC + 19], neural networks have achieved results that were almost unthinkable even a decade ago.
However, while the size, performance and scope of neural networks steadily increases, their opaqueness remains an equally important and essentially unsolved problem [AB18]. Frequently denoted as "black-box"-models, the decisions of neural networks are to this day hard to explain and, likewise, their properties hard to verify.
In this paper, we are concerned with Typed Affine Decision Structures (TADS) [SNMB22], a novel decisiontree-like data structure that represents piece-wise affine functions. TADS are specifically designed to act as interpretable white-box models that can precisely represent any piece-wise linear neural network in an understandable fashion.
While TADS are structurally well-suited for global model explanation and verification of neural networks, the full explanation of even medium-sized neural networks is well out of scope. It is well-known that the semantic complexity of a neural network-with respect to many different measures of complexity-grows exponentially in its size. As a consequence, any precise global explanation of such a model incurs exponential scaling issues [BS14a, MPCB14, FRH + 19].
In this paper, we are interested in applying TADS to verifying local properties of neural networks, most notably robustness properties [CW17, LLWX18, SZS + 13, MST20]. Robustness properties encode that, at certain points that a user desires, a neural network's classification is invariant to small changes of its input. For example, in image recognition, if one knows that an image represents a certain object, a single flip of a pixel should not drastically change the networks correct classification of said image. Robustness properties are the most commonly considered properties in neural network verification and make up the majority of current benchmarks in the VNNComp verification competition [BLJ21].
To apply TADS, which are in principle global model explanations, to local properties, we will introduce precondition projection, a transformation of TADS that restricts their domain to a certain region of interest. Further, we show how the algebraic properties of TADS can be used to directly model the classification behavior of a neural network. This is important as neural networks, although often used as classifiers (assigning one of finitely many classes to an input), are fundamentally regression models (assigning real values to their input). By modeling the argmax function directly on a TADS level, this gap can be bridged in an elegant fashion.
Finally, we will present a case study in which we apply TADS to robustness analysis and present its advantages. At present, TADS do not yet scale well to larger problems. We will introduce an approach that uses the wellunderstood dimensionality reduction technique PCA to prove an underapproximation to the robustness property of interest. This approach mitigates the scaling issues incurred by TADSs, but lacks reliable guarantees on robustness. Thus, we introduce another approach that directly trains neural networks to operate on inputs that are simplified by PCA. This method is of similar computational complexity than the underapproximation approach, but yields neural networks for which TADS can give reliable robustness guarantees, while incurring only a small loss in neural network accuracy.
Lastly, we will show on a concrete example how a TADS-based robustness proof looks like and what additional information it yields beyond already existing verification tools. We will show how this information can be used to characterize precisely and completely the entire set of inputs that violate a given property, and how it can be used to find "closest" adversarial examples, if they exist.
A real vector ( 1 , . . . , ) of R is abbreviated as ì. To refer to its -th component, we write (in contrast, ì denotes the -th vector in some enumeration). The dimension of a real vector space R is given as dim R = .
A matrix is a collection of real values arranged in a rectangular array with rows and columns.  To indicate the number of rows and columns, one says has type × commonly notated as ∈ R × . An element at position , of the matrix is denoted by , := , (where 1 ≤ ≤ and 1 ≤ ≤ ). A matrix ∈ R × can be reflected along the main diagonal resulting in the transpose of shape × defined by the equation The -th row of can be regarded as a 1 × matrix given by ,• := ( ,1 , . . . , , ) . Similarly, the -th column of can be regarded as a × 1 matrix defined as Matrix addition is defined over matrices with the same type to be component-wise, i.e., and scalar multiplication as The (type-correct) multiplication of two matrices ∈ R × and ∈ R × is defined as Identifying -× 1 matrices with (column) vectors -1 × matrices with row vectors -1 × 1 matrices with scalars as indicated above, makes the well-known dot product of ì, ì ∈ R ì, ì := ∑︁ =1 ì · ì just a special case of matrix multiplication. The same holds for matrix-vector multiplication that is defined for a × matrix and a vector ì ∈ R as Matrices with the same number of rows and columns, i.e., with type × for some ∈ N, are said to be square matrices.

Definition 2.1 (Affine Function).
A function : R → R is called affine iff it can be written as ( ì) = ì + ì for some matrix ∈ R × and vector ∈ R . We identify the semantics and syntax of affine functions with the pair ( , ì ) which can be considered as a canonical representation of affine functions. Furthermore, we denote the set of all affine functions R → R as Φ → with type ( , ). The untyped version Φ is meant to refer to the set of all affine functions, independently of their type.
It is well-known that the type resulting from function composition evolves as follows The type of the operation is important for the closure axiom, the basis for most algebraic structures. This leads to the following well-known theorem [Axl97]: Theorem 2.3 (Algebraic Properties). Denoting, as usual, scalar multiplication with · and function composition with •, we have: This theorem can straightforwardly be lifted to untyped Φ by simply restricting all operations to the cases where they are well-typed, i.e., where addition is restricted to functions of the same type (+ ), and function composition to situation where the output type of the first function matches the input type of the second (• ): Theorem 2.4 (Properties of Typed Operations). (Φ, + , ·, • ) is a typed algebra, i.e, an algebraic structure that is closed under well-typed operations.

Definition 2.5 (Hyperplanes and Halfspace).
Let ì ∈ R and ∈ R. Then the set is called a hyperplane of R . A hyperplane partitions R into two convex subspaces, called halfspaces. The positive and negative halfspaces of , respectively, are defined as

Definition 2.6 (Polyhedron).
A polyhedron ⊆ R is the intersection of halfspaces for some natural number

Definition 2.7 (Piece-wise Affine Function).
A function : R → R is called piece-wise affine if it can be written as } is a set of polyhedra that partitions the space R and 1 , . . . , are affine functions. We call = ì + ì with 1 ≤ ≤ the function associated with polyhedron .

Norms and Distances
Throughout this work, we will often be concerned with the behavior of neural networks and how it changes when a point is slightly altered. Thus, we will often be concerned with different neighborhoods of points. These are formalized in mathematics using metric spaces and normed spaces [Mag22]. For our purposes, however, a special type of normed spaces defined by so-called -norms is sufficient:

Definition 2.8 (L-Norms).
For ∈ N, the -norm is the function · : R → R defined by: Important -norms are the 1 norm ì 1 := ∑︁ =1 | | and the 2 norm or euclidean norm 1 Another important norm is the so-called ∞ norm. While not technically an -norm according to the definition above, it arises naturally as the limit of the -norms as approaches infinity and is defined by: With these definitions we can now formalize the neighborhood of a point. 1 For real vector spaces R the euclidean norm is the canonical norm as ì 2 = ì, ì .

Definition 2.9 (Unit Ball).
For a given -norm · we define the corresponding -unit ball as (1) The unit ball is a closed, convex subset of R centered at the origin. It generalizes the notion of a disk with radius 1 to both higher dimensions ( > 2) and non-euclidean spaces ( ≠ 2). The relevant unit balls for this paper are illustrated in Figure 1. Every generic -ball of R can be constructed using translation of unit -balls. In the latter case is called the radius. If it is clear from the context, we omit the dimensionality of the -ball.

Neural Networks
The following brief introduction to neural networks is based on [GBC16], but in its presentation adapted to better fit the context of this work. Neural networks are perhaps todays most important machine learning models that are most succinctly characterized by their layered structure. There exist numerous neural network architectures that one might consider. For this work, we focus on the very general class of fully connected neural networks and define neural networks as follows: Definition 2.10 (Piece-wise Linear Neural Networks). A piece-wise linear neural network with layers is a machine learning model consisting of an alternating sequence of affine preactivation functions and − 1 ReLU activation functions : = +1 ; ; ; · · · ; ; 1 .
For the PLNN to be syntactically correct, the affine functions must be compatible, i.e., the output dimension of each preactivation must match the input dimension of the following.
In accordance to standard neural network terminology, we call the combination of a preactivation with its activation • the -th layer of the neural network.
In the following paragraphs preactivations and activations are properly introduced. After that, the semantics of a PLNN can be defined in terms of its components. Lastly, a common complexity measure of PLNN is presented. Preactivations. In traditional applications, the concrete affine functions of a PLNN , as defined in Definition 2.10, would result from a training process, usually using gradient descent based optimization techniques [GBC16], where the PLNN is trained to accurately predict desired outputs on a given dataset.
Activations. The activation function is an architectural design choice made a-priori by the user. The primary purpose of the activation function is to introduce nonlinearity into the neural network, which can drastically increase the amount of functions that can be approximated. For the purposes of this paper, we exclusively use the rectified linear unit (ReLU) function.
ReLU. The ReLU function has proven to be a successful activation function in practical applications, combining convenient properties of linear functions with a sufficient degree of non-linearity. It is prominently recommended as the default choice of activation function for fully connected neural networks [GBC16]. Furthermore, due to the simple structure of the ReLU function, neural networks with ReLU activations lend themselves well to formal analysis and are typically considered in verification tasks [KBD + 17,BLJ21].
As the ReLU activation function is the only activation function we consider, we will use for the remainder of this paper exclusively to refer to the ReLU function and omit the explicit mention of the dimensionality when it is clear from context.

Definition 2.12 (PLNN Semantics).
The semantics of a piece-wise linear neural network is a piece-wise affine function N : R → R given by the sequential evaluation of its layers: For evaluation, a vector ì ∈ R is passed layer by layer through the PLNN. The data-flow is unidirectional and using the above notation from right to left.
Note that, given the close relationship between a PLNN's syntax and semantics, many works in deep learning choose to not clearly separate the syntax and semantics of PLNN's. A transition between the two definitions can be easily achieved by replacing ';' with '•' in Definition 2.10.
Traditionally, neural networks are visualized as computation graphs, where the nodes are the eponymous "neurons". There, each affine function : R → R is visualized as a bipartite graph connecting input neurons to output neurons. An example of such graph is given in Figure 3.
From the representation used in Definition 2.10, the number of neurons of a neural network can be computed through the preactivations as follows: Let = +1 ;· · ·; 1 with : R → R +1 then the total number of neurons of is given by The number of neurons is a natural measure of "size" in a neural network, and it is well-known that the semantic complexity of functions-measured in the number of linear regions that are needed to characterize them-that a neural network can represent increases exponentially in its number of neurons [BS14a, MPCB14, FRH + 19].

Neural Networks Classifiers
As defined in Definition 2.10, PLNNs are fundamentally representations of continuous functions R → R . However, they are frequently employed in classification tasks where the co-domain is instead a discrete set of classes {1, . . . , }. To bridge this gap, one typically proceeds by training a neural network : R → R and associating each component of its output ì = ( ì) with the -th class. Then, the class with the largest is chosen for classification. This is formalized by the argmax function.

Definition 2.13 (Argmax).
The -dimensional argmax function iff is the smallest index for which ≥ holds for all 1 ≤ ≤ .
Again, when it is clear from context, we omit the index denoting the dimensionality and simply write arg max. As described before, the argmax function can be used to convert PLNNs into classifiers. This naturally leads us to define PLNN classifiers. Definition 2.14 (PLNN Classifiers). For a PLNN with : R → R , the corresponding PLNN classifier : R → {1, . . . , } is defined as = arg max • N .

Typed Affine Decision Structures
Central to our explanation approach is a decision-tree-like data structure that we call Typed Affine Decision Structure (TADS). Based on the transformation process presented in [SNMB22], it is possible to transform a PLNN into a semantically equivalent TADS ( ). The transformation is based on the common syntactical representation of PLNNs and is compositional in the layers. PLNNs explanation and verification is challenging because of the complex data flow of PLNNs [SNMB22].

Data Structure.
Skipping implementation details, TADS can be introduced intuitively using decision trees. In a decision tree, one distinguishes two types of nodes: 1. Inner nodes have decision predicates. For every possible evaluation of that predicate, the node has exactly one successor. 2. Leaves are elements from a given universe that one wants to distinguish.
For TADS, specifically, leaves are from the universe of affine functions and decision predicates are affine inequalities 2 . An example of a TADS can be found in Figure 4. TADS structurally resemble decision trees that are widely considered explainable machine learning models, i.e., they can, by virtue of their structure, be understood by a human [GMR + 18]. Based on this introduction using decision trees one can straightforwardly define TADS.

Definition 2.15 (TADS).
A TADS = ( , →, ) is a decision DAG 3 ( , →, ) with root whose nodes have the following two types: 1. Inner nodes are called decisions or predicates. They consist of an affine inequality and two successors, one if the predicate is true and one if not.
2. Leaves are also called terminals. They are affine functions and have no successors.
To be syntactically correct, all nodes (i.e., all inequalities and affine functions) must accept input vectors with a fixed number of entries. This is called the input dimension of the TADS. Similarly, all terminals must map input vectors into a common output space. The dimensionality of this output space is called the output dimension. 4 For given input dimension and output dimension we define the set of all TADS as Θ → .
TADS are sequentially evaluated like a decision tree.

Definition 2.16 (TADS Evaluation).
The semantic function of TADS 5 Semantically, both PLNNs and TADS represent piecewise affine functions. Moreover, PLNNs can be transformed into TADS:

Lemma 2.17 (Trinity: PLNNs, TADS, and PAFs).
There exists a semantics preserving transformation : N → Θ from PLNNs to TADS, such that the following diagram commutes: TADS are computationally transparent and semantically equivalent to PLNNs.

Algebraic Properties.
Much like ADDs and BDDs, TADS inherit the algebraic properties of their leaf algebra. For TADS, the leave algebra-affine functions-forms a vector space. Using lifting one can directly implement the vector space operations on TADS [SNMB22].

Lemma 2.18 (Lifting).
Lifting addition (+) and scalar multiplication (·) from affine functions to TADS gives semantically equivalent operators to their PAF counterparts, i.e., for all TADS By the lifting theorem of [SNMB22] the algebraic properties are preserved and thus: The composition operator is especially important in the context of neural networks, as neural networks are inherently compositions of piece-wise affine functions.

Principal Component Analysis
Principal Component Analysis (PCA) is one of the most popular techniques for dimensionality reduction and feature extraction [WEG87,AW10,BS14b]. At a high level, it seeks to find, for a given dataset ⊂ R , a linear subspace ⊂ R with dimension dim( ) that can be used to encode with as little reconstruction loss as possible.
Such an encoding is useful for machine learning algorithms as it can drastically reduce the input dimension. Large input dimensions can be very problematic in machine learning and entail numerous potential problems, altogether known as the curse of dimensionality [TK06].
The fundamental objects of PCA are the eponymous principal components that are defined as follows: Definition 2.22 (Principal Components). For a given dataset = { ì 1 , . . . , ì } ⊂ R with ≥ and zero mean ì ∈ = ì 0, there exist principal components ì 1 , . . . , ì ∈ R which are characterized as iterative solutions to the following optimization problem: The -th principal component ì maximizes the variance of the data when it is projected onto ì : ∑︁ ∈ ì , ì 2 → max under the constraint that has unit length and is orthogonal to all previous principal components ∀ℎ < : ì , ì ℎ = 0 . (b) PCA dimensionality reduction from R 2 to R 1 . The PCA representation is given by 1 ( ).  Note that every dataset with non-zero mean, i.e., ì ∈ ì | | = ì with ì ≠ 0, can be made to obey the restriction ì = ì 0 by performing the following transformation on each datapoint: ì = ì − ì. By definition, the principal components are pair-wise orthogonal and normed and therefore linearly independent. Thus, they form a basis of R . It follows that there is a natural, unique representation based on the principal components ( ì) = ( 1 , . . . , ) such that: In particular, in the case of PCA, the can be computed as = ì , ì .
With this, PCA can naturally be used as a dimensionality reduction tool. Definition 2.23 (PCA Dimensionality Reduction). Let 0 < < . For some ì ∈ R with ( ì) = ( 1 , . . . , ), the -dimensional PCA representation is given by cutting off the PCA representation after the -th element: Consequently, the -dimensional PCA reconstruction to ì is given as: As is a projection for < , it loses information. Therefore, the PCA reconstruction after dimensionality reduction is approximative.
In essence, the composition of PCA encoding and reconstruction forms a function that is close to the identity function on the dataset and its surrounding points while reducing the number of dimensions needed to express the data. The success of PCA is heavily dependent on the dataset being mainly distributed along a linear subspace of R and its generalization performance requires that new data follow the same distribution as the training data. However, if these assumptions hold, it is a very good approximation, as indicated by the following defining property of the principal components: Lemma 2.24. The principal components are exactly those vectors that make the reconstruction error minimal over among all linear, orthogonal encoders and decoders using dimensions [AW10]. That is, for all orthogonal, linear functions : R → R , and : R → R , the term PCA is attractive for multiple reasons. First, PCA representations and approximations are linear functions which makes them easy to work with. Second, PCA supports reductions to for any 0 < ≤ , which makes PCA very flexible. Lastly, but perhaps most importantly, PCA is a well-understood and well-proven method in practice and can elegantly enable strong performance in even relatively simple machine learning models. An example of a PCA encoding and reconstruction is shown in Figures 5a to 5c.  PCA allows the compression of high-dimensional inputs into lower-dimensional representations such that a given dataset is compressed with as little information loss as is possible using orthogonal, linear compression.

Introduction to MNIST
In the remainder of the paper we consider the problem of digit recognition using the MNIST dataset [Den12]. The MNIST dataset provides a traditional baseline-problem scenario for machine learning. While simpler than modern, large-scale machine learning tasks, MNIST requires PLNNs of relevant size for satisfactory classification and stands to this day as an introductory problem in verification benchmarks [BLJ21]. The MNIST dataset consists of 70.000 gray-scale images of hand-written digits, each labeled with the digit they represent to a human observer. The dataset is split A challenge for classification problems like this is to control so-called adversarial examples as discussed in the following.

Robustness to Adversarial Examples
In essence, robustness is the absence of adversarial examples, which are perhaps the most well-known manifestations of chaotic behavior of neural networks and have received wide attention in research [SZS + 13, GSS14, KGB + 16]. We work with the following definition of adversarial examples:

Definition 3.1 (Adversarial Example).
Let c : R → {1, . . . , } be a PLNN classifier. Further, let ì ∈ R be a given point of interest that is correctly classified by c . Then, ì ∈ R is an -adversarial example to ì iff If c admits no -adversarial examples for a given input ì, then it is called -robust around ì.
Intuitively, an adversarial example is a slight perturbation of an input that, although minor, changes the neural networks prediction. Note that in image recognition problems such as MNIST, the restriction || ì − ì|| ∞ ≤ encodes that between ì and ì, each pixel can only differ by at most . In practice, adversarial examples can be almost imperceptible to a human [LLWX18, SZS + 13] while arbitrarily altering previously correct decisions, sometimes yielding outlandish classification results, which may enable outside attacks on neural network systems. Thus, it is critical that neural networks cannot be adversarially attacked at points where the desired semantics is clear 6 . 6 Of course, neural networks necessarily have regions where the prediction flips from one class to another. Ideally, these flips should only occur in regions where inputs are non-sensical and would not intuitively be assigned to any class by a human. Therefore, robustness is usually considered only at some select sample inputs where semantics is clear.

Verifying Robustness
Generally, PLNN verification is the task proving a property for the result of a PLNN where the input is restricted to a given domain [BTT + 18, A + 21]. Formally, let c : R → R be a PLNN, ⊂ R a restriction of the input domain, and : R → {0, 1} a predicate. Then PLNN verification is the task of proving or refuting with a counterexample the formula ∀ì ∈ : ( c ( ì)) . (2) For the case of verifying -robustness around ì for c , we can formulate (2) specifically as (cf. Definition 3.1) ∀ì ∈ ì + ∞ : c ( ì) = c ( ì) .
Corresponding state-of-the-art verification tools use different methods like [BTT + 18]:

-Satisfiability Modulo Theories -Mixed Integer Programming -Branch and Bound
For more information, see Section 7 on related work.

Extending TADS to Cover Robustness Properties
TADS are characterized by: 1. Global explanations, i.e., they explain the behavior of a PLNN over the entire space of possible inputs. Robustness properties however concern only the relatively small neighborhood + ∞ of a point ì.
2. Regression behavior, they represent a continuous function. With respect to robustness, we are however interested in the behavior of the associated PLNN classifier.
The following two subsections will show that TADS are nevertheless well-suited to deal with robustness properties.

Precondition Projection on TADS
When studying adversarial examples, one may use the strict preconditions (given as infinity balls ∞ ) to reduce the work load. Given the strong connection between affine functions and (convex) polytopes, it is a straightforward procedure to apply polyhedral preconditions-such as infinity balls as particularly required for robustness properties-on TADS. Please note that stronger preconditions result in less work. Given a TADS representing a piece-wise affine function : R → R we are interested in the behavior of on a given (small) polyhedron ⊂ R . In other words, we are interested in the function : → R which is given by: Technically, this is implemented by encoding the polytope as a TADS using affine inequalities: By explicitly eliminating paths that lead to ⊥ (see [SNMB22]), the resulting TADS is significantly reduced in size.

Argmax on TADS for Classification
Neural networks are frequently used for classification, as outlined in Section 2.4.1. As described there, the neural network classifier associated with a given neural network can naturally be modeled as neural network that is meant to be used as a neural network classifier, it is important that one analyzes it with respect to its classification behavior. We know how to construct a for any PLNN . On the other hand, it is also easy to see how a TADS can be constructed for arg max: Intuitively, such a TADS needs only to perform a linear search for the maximum of ì = ( 1 , . . . , ) from 1 to . Figure 8 illustrates this for the three-dimensional argmax. 7 This TADS first compares 1 and 2 in the first layer, then compares their maximum with 3 to attain the result. The extension to higher dimensions is straightforward.
Taken together, it is straightforward to construct the classification TADS using TADS composition as follows: = .
The semantical correctness of this construction follows directly from the correctness of the TADS composition, i.e.:

Verifying Robustness on MNIST Using TADS
The following subsections of this Section present four approaches to robustness verification via TADS and illustrate them using the MNIST data set: 7 Note that this TADS deviates slightly from the representation of TADS we use for the rest of this paper, notably with respect to the way linear inequalities are represented. This is purely done to enhance readability. -A straightforward approach where the considered PLNN is directly transformed into a TADS (cf., Figure 9a). This approach typically does not scale due to the typical exponential explosion of the TADS transformation. -An approximative approach based on PCA-based dimensionality reduction that scales, provides a good heuristics to search for adversarial examples, but is insufficient to prove robustness (cf., Figure 9b). In this case, the TADS-based analysis only covers the subspace that can be 'reached' from the initial, lowdimensional PCA-based vectors space via decoding and adequate basis transformation as indicated by the blue part. Thus, this approach cannot guarantee that the analysis of the TADS is sufficient to reveal all adversarial examples of the original PLNN. -A transformational approach based on PCA-based dimensionality reduction, where the PLNN is extended by a preprocessing step, defined by PCA-based autoencoding, i.e., the composition of a PCA-based dimensionality reduction followed by a linear function that embeds (decodes) the low-dimensional space into the original space (cf., Figure 9c). Here we can show that analyses of the partial extension that start with the decoding are sufficient to obtain robustness results for the extended PLNN that is defined for the 784-dimensional space of MNIST. -A modification of the third approach, where the linear function defined by the composition of the decoder and the initialization layer of the original net is replaced by a linear layer to provide a network architecture with the same number of layers but with a strongly reduced input dimension (cf., Figure 9d). The PLNN considered for verification is now given as the result of a learning process using the same sample set as in the other cases, but starting with a PCAbased reduction step. Technically, the subsequent TADS-based robustness analysis proceeds exactly in the same way as before guaranteeing that the robustness result proven for the dark green part can again be lifted to the overall net.
We will show that the third and fourth approaches allow us to prove full robustness in a computationally efficient manner. However, they come at the price of modifying the PLNN. In our eyes, this is no disadvantage as long as the modified PLNN is still sufficiently accurate; Neural networks are themselves only results of a heuristic training process and have no intrinsic merit beyond their predictive accuracy. In fact, the results shown in Figure 12 indicate that predictive accuracy can still be achieved after a significant reduction in dimensionality, drastically easing formal verification.

Full Verification with TADS
At their baseline, TADS are so called model explanations [GMP19] of PLNNs, i.e., for any classification PLNN : R → C, a corresponding TADS can be generated that represents the same function as in an easily comprehensible and analyzable manner. Of course, the global behavior of neural networks is usually too large to be represented with a TADS. However, in the case of robustness verification, we are only interested in the behavior of in the neighborhood around some point of interest ì, formalized by an infinity ball (see Definition 2.9). Recall from Definition 3.1 that -robustness for a point ì is formalized by the property Equivalently, this problem can also be stated as that is, the neighborhood of ì defined by the infinity ball ∞ of dimension with radius is classified consistently as one class. This property can be verified using the following theorem: The correctness of this theorem follows directly from the correctness results regarding TADS that were established in Section 4.
The approach to directly verify the original network sketched at the very left of Figure 9 only works for quite small MNIST networks. Core reason for this scaling problem is the dimensionality of MNIST: With 784dimensional inputs, the volume of the -ball around ì is proportional to 784 which grows quite quickly leading to intractably large TADS.
The complexity of robustness verification increases exponentially with the number of input dimensions. Reducing dimensionality is therefore key to mitigating scaling issues and proving robustness for larger .

PCA Guided Validation
To improve scalability of TADS-based verification, one might consider approximative robustness instead. More concretely, instead of searching for adversarial examples in the full ball ∞ ( ì), we will present an approach that restricts the search to a lower-dimensional subset ⊂ ∞ (ì). This will yield an underapproximation to robustness: If an adversarial example is found in , it also exists in ∞ ( ì) and robustness is violated. However, the absence The first six of which are visualized in Figure 10. Recall from Definition 2.23, that the principal components of ì are precisely those along which a given dataset scatters most. They are therefore natural candidates to explore in a heuristic search for adversarial examples.
Let ì be some point for which we seek to find adversarial examples. Then, we can define the -dimensional PCA space around ì as follows: This space contains all vectors that are reachable from ì along the principal components or, equivalently, the image of the PCA decoding function . This allows us to define a search space for adversarial examples: Observe that is by definition a subset of ∞ of dimensionality and that, as both and ∞ are defined by linear equations, it can be conveniently expressed as a TADS precondition.
Restricting the search for adversarial examples to via a PCA-based transformations that adequately decodes the vectors of some k-dimensional ball ∞ , as sketched in Figure 9b, drastically reduces the computational load. 8 However, this reduction comes at a price, which is usually quite high (cf., Figure 11): Independently of the choice of , it can never prove the absence of adversarial examples.
Dimensionality reduction mitigates scalability issues, but can only be considered as a heuristics for finding adversarial examples.

Built-in PCA Verification
Fundamentally, neural networks are heuristic models that seek only to achieve high performance, which is typically defined as the accuracy of their predictions. If a neural network is only as useful as its predictive accuracy, then any change that is made to the neural network that does not drastically alter its predictive accuracy is acceptable. This opens up a new angle to neural network verification that is unlike traditional program verification: Rather than trying to verify a given network as it is, one may well alter the network as long as this does not impair the prediction quality too much. In fact, we consider such a step (often) necessary, as classifiers defined by highdimensional neural networks will often not be robust, but small alterations may well be. Figure 9c sketches how the idea of PCA can be used to achieve such an alteration: The point is that each input is channeled through the low-dimensional PCA space, which, similar to the situation in the previous section, is simple enough to support verification. However we will see that, in contrast to the previous section, the special character of PCA encoding allows us to infer robustness result from the robustness results for the PCA space. More concretely, after verifying the robustness of • we establish a robustness result for the full modified net The success of this method very much depends on the accuracy of , which itself strongly depends on the chosen . We will discuss this issue in the Section 5.4. In the remainder of this section, we show how to infer robustness result for the full -dimensional vector space from robustness results for • . Key observation to prove this property is that PCA preserves neighborhoods:

Lemma 5.2 (PCA Preserves Neighborhoods).
Let be the PCA transformation of the first principal components. For an input ì and an -neighbor ì with ì − ì ∞ ≤ one can estimate their distance in the image of as Proof.
One can see that the bound is tight by setting where sgn( ì ) is the sign function applied component wise to ì . For that ì equality holds for all steps.
As all have unit length, it is possible to derive an upper bound for ì 1 for every PCA. It is obtained when at least one principal component ì (with some 1 ≤ ≤ ) equals In that case the norm is ì 1 = √ , leading to the following proposition.

Corollary 5.3.
For every ≤ and every set of principal components ì 1 , . . . , ì the PCA representation satisfies This suffices to prove the announced robustness result: Theorem 5.4 (Robustness). Let : R → R by a PLNN. Then, let If is -robust around ( ì) with then is -robust around ì.

Proof.
For a proof by contraposition, we show that if is not -robust, then is not -robust either. Let ì ∈ R be an adversarial example for with ì − ì ∞ ≤ . By Lemma 5.2 it follows that (ì) − ( ì) ∞ ≤ . Therefore (ì) ∈ ( ì) + ∞ . And since ì is an adversarial, it follows as desired that In other words, proving 's robustness on thedimensional PCA space with radius directly proves robustness for the entire construct with radius = √ . In the case of MNIST, is equal to 784. Therefore, proving robustness of for some radius implies robustness of with radius at least ≥ ≥ 28 .
Altering the classifier via PCA enables full robustness verification via low-dimensional reasoning. However, as shown in Figure 12, this comes at the cost of accuracy, in particular for small .

Improving Accuracy
As laid out in Section 5.3, PCA can be used to modify a neural network in a manner that makes it much easier to verify at the cost of some predictive accuracy. Fortunately, by modifying not only the neural network itself, but also its training process, some of that lost accuracy can be regained at almost no cost. Figure 9d sketches a way how both, verification can be eased and and accuracy for low can be improved. Key to this approach is the observation that in Figure 9c the PCA decoder and the first linear layer are adjacent and can therefore simply evaluate to a linear function with k-dimensional input and an output dimension defined by the first hidden layer. Thus, rather than just modifying the original classifier via PCA autoencoding, one can (re-) learn the entire green part through the PCA encoder. This results in a much smaller trained network which, in particular, is shielded from the 784 dimensions of MNIST by the PCA decoder. In fact, in our setup, -the number of neurons in is essentially an order of magnitude smaller than the original net, and -the performance of • is much better for small , as shown in Figure 13.
Specifically training a neural network according to PCA encoding improves accuracy, in particular, for small PCA dimensions .  Figure 13: Comparison of predictive accuracy on MNIST's test set for the modified PLNNs of Section 5.3 and Section 5.4. The violet line ("variant 5.1") shows the reference accuracy of an unmodified PLNN with same architecture and hyperparameters. All networks were trained as in Figure 12, but only the accuracy after the 5-th epoch is shown. Figure 14: A function plot representing the TADS of in an area around ì 9 . The yellow area contains points that are classified correctly, the black area contains points that are classified incorrectly. The blue point represents ì 9 and the red point represents the adversarial example ì 5 .

Experimental Results
In the following, we will showcase experimental results regarding the TADS-based verification of neural networks using PCA to reduce the dimensionality of the verification problem. We will start by considering the reduction to two dimensions, allowing us to visualize the process and showcase its workings conceptually. Afterwards, we will move towards higher dimensions, examining more concrete questions of scalability.

Conceptual Showcase and Visualization
For this section, we consider the neural network classifier where is a fully connected ReLU-network with 5 layers of 10 neurons each. Training is done on the MNIST training set with batches of 300 images per training step using standard settings of the ADAM optimizer [KB14]. This classifier uses the two-dimensional PCA representation. This allows us to plot the function represented by N • 2 as done in Figure 14. We consider the sample ì 9 shown in Figure 15. This image is classified correctly by , being assigned the label "9". However, as we will see, this classification is very unstable.
Using TADS, we can gain insight into this prediction by creating the class characterization TADS 9 = =9 + .01 × = : "9" Δ : "5" Figure 15: MNIST sample that represents the number "9" (left) and a close adversarial example that is classified as "5" (right). The difference between the two is marginal (center). The adversarial was found in a neighboring linear region using a restricted TADS (cf., Figure 16). Vectors are visualized using a perceptually uniform diverging color palette (Seaborn's "icefire"). Idea of representation [GSS14].
for and class "9" on the infinity ball ì 9 + 0.3 · 2 ∞ This TADS is shown in Figure 16 and can be interpreted as follows: Any input belonging to the set ì 9 +0.3· 2 ∞ that reaches the "1" terminal in the TADS depicted in Figure 16 is classified as a "9" by (which is the desired behavior), while all others are adversarial examples.
Moreover, we can visualize the function plot corresponding to this TADS as shown in Figure 14. Note that lines in this plot indicate decision boundaries that are implied by the non-terminal nodes in the TADS. These decision boundaries separate the regions of the piecewise affine function encoded by the neural network. As a consequence, each polygon that is enclosed by such linear boundaries corresponds to precisely one path in the TADS 9 .
One can immediately observe that while classifies ì 9 correctly, there exists a close region of inputs that are classified incorrectly. Using the information contained in the TADS, it is trivial to obtain adversarial examples by picking any path in the TADS ending in the "0" terminal and finding a point satisfying the corresponding path condition. An example adversarial example generated in this way is shown in Figure 15. Observe that, while being classified differently by , both images are almost identical to the human eye, which indicates that this neural network might not be entirely trustworthy even though it classified ì 9 correctly.

Scaling to higher dimensions
After showcasing our verification approach conceptually on a 2-dimensional problem, we now move towards higher dimensions and seek to examine how the addition of new dimensions affects scalability. To do this, we construct a neural network classifier that uses a 6-dimensional input representation instead of a 2-dimensional one (all other settings are equal) This increase of dimension drastically improves the accuracy, however at the price of an explosion in size of the corresponding TADS. All reported numbers reflect an average according to six random runs.
Accuracy. The six-dimensional neural network classifier achieved roughly 74% accuracy on the test set in comparison to the 91% accuracy of the original unrestricted network, but much better than the 46% accuracy of the two-dimensional classifier (cf., Figure 13).
We also tested different dimensions for PCA with respect to network accuracy, the results of which can be found in Figure 12. These results show that in this case, a dimensionality reduction by an order of magnitude still allows one to achieve 90% accuracy, which is very close to the 91% accuracy of the original network.
Scalability. In the two-dimensional case, we showed an example where robustness around some input could be accurately disproven with a radius of = 0.3, which, according to Lemma 5.  Figure 16: A TADS representing the behavior of around ì 9 . For readability, this TADS is constructed such that ì 9 corresponds to the vector (0, 0). The terminal node "1" represents a correct classification, the node "0" an incorrect one. space of interest, had 51 nodes and could be handled quite easily. We repeat this experiment with the input image shown in Figure 17 and the six-dimensional neural network instead. The TADS resulting from this experiment possesses roughly 4600 nodes. This is still manageable computationally, but indicates the expected explosion in size.

Related Work
The topic of robustness has been widely discussed in the machine learning community ever since it first gained attention in 2013 [SZS + 13]. One topic of interest is research into heuristic methods that quickly and reliably find adversarial examples for modern neural networks, serving to understand how adversarial examples occur and therefore how they might be mitigated [CW17,GSS14]. As they are devised by the machine learning community, it is not surprising that these methods devised to find adversarial examples typically leverage methods from the machine learning toolbox, using gradient descent and other training heuristics to find adversarial examples. Another natural topic with respect to robustness has been constructing neural networks that are reliably robust after training. A typical approach to this is defensive distillation [PMW + 16]. Defensive distillation seeks to secure a previously trained neural network. This is achieved by using the outputs of the first neural network to train a second neural network with equivalent architecture. This process is called distilling. The additional information provided by the first neural network allows for efficient training in less training steps, reducing the need for large parameter values and therefore reducing the risk of adversarial attacks. Other approaches directly modify the training process to ensure scalability, usually by introducing additional regularization terms that are meant to steer the training process into a robust direction, often working in tandem with formal methods [ZSLG16, WCAJ18, HYY + 18].
Closer to our approach are neural network verification approaches (for robustness). They can be split into two categories, approaches based on branch-and-bound tree search algorithms [Dak65,LNPT18] and approaches based on abstract interpretation [CC92].
Neural Network Verification -Tree Search. Much like SAT and SMT solvers, these approaches use a branchand-bound tree search algorithm to find a counterexample to the property of interest. A critical part of this is finding an apt ReLU configuration, i.e., which neuron activation values need to be set to 0 by the ReLU activation function and which do not. This corresponds to finding a satisfiable path in a TADS that contains a counterexample, which makes TADS based verification inherently a representative of this category.
Other examples include Reluplex [KBD + 17], one of the earliest scalable neural network verifiers, and alpha-betacrown [WZX + 21], a modern method that can be regarded as current state-of-the-art [BLJ21]. Methods of this type differ mostly in the heuristics that guide their branching and bounding.
Tree search methods are accurate and leading in practice, but they tend to be more time intensive than abstractinterpretation based methods. Moreover, they are, much like TADSs, naturally restricted to piece-wise affine neural networks and cannot cover activation functions such as sigmoid or softmax.
Neural Network Verification -Abstract Interpretation. These neural network verifiers define an abstract interpretation of neural networks to attain an overapproximation of the reachable states that a neural network can output on a given input region [EGK20]. As these methods compute an overapproximation of the truly reachable states, they are safe, but not complete, i.e., they might incorrectly state that a given property is violated when it is not. On the flipside, abstract interpretation verifiers are typically computationally quite efficient and extend to neural networks that are not piece-wise affine. Examples of verifiers based on abstract interpretation include AI 2 [EGK20] and Deep-Poly [SGPV19]. Our TADS-based approach naturally also applies to abstractly interpreted neural networks.

Conclusion
In this paper, we have applied TADS, a white-box representation of neural networks, to the problem of neural network robustness. To apply TADS to this problem, we have introduced precondition projection and showed how to extend the argmax function, that is typically used with neural networks in classification tasks, to generate TADS that precisely describe a neural networks classification behavior in a given area around a fixed input point. Choosing the considered robustness region as precondition, robustness becomes equivalent to the property that the entire corresponding TADS collapses to one node that then characterizes the robust classification. If this is not the case, the resulting TADS explicitly represents the set of all adversarial examples.
This unique power of TADS-based robustness verification comes at the price of an exponential complexity which we have proposed to mitigate via PCA-based dimensionality reduction by focusing the verification on the image of a low-dimensional PCA encoding. Three versions of this approach have been discussed: -An approximative version that can be regarded as an elaborate search heuristics for adversarial examples, -A transformational approach where the PLNN is extended by a preprocessing step defined by PCAbased auto-encoding, and which allows one to infer robustness of the 784-dimensional transformed network based on the analysis of the corresponding low-dimensional PCA space, and -An approach that is based on a modified learning process, specifically tailored to the corresponding PCA-based encoding. This method leverages the machine learning toolbox to improve the accuracy of the 784-dimensional transformed network while still allowing low-dimensional robustness verification.
We believe that dimensionality reduction, as illustrated in this paper for PCA, is key to achieve neural networks that are ready for verification. The challenge is to find dimensionality reduction techniques that maintain a high level of accuracy. In our experience, the success of such techniques hinges on characteristics of the application domain. We are optimistic that this approach will widen the scope of applications where neural networks are accepted.