1 Introduction

Deep neural networks (DNNs) [1, 2] encountered tremendous success in the recent past due to their ability to infer highly nonlinear relations from data, learn accurate predictive models, and make smart decisions with little or no human intervention. Despite their success, the correctness of neural networks remains a major concern due to their complexity and lack of transparency. This is especially the case for safety- and security-critical applications where errors and biases can have serious undesired consequences, such as in medical diagnosis [3], self-driving cars [4], or financial systems [5]. New techniques that can prove mathematical guarantees concerning the behavior of neural networks are the need of the hour [6]. An effective approach to address this issue is the use of automatic verification techniques, which can either formally prove that the network adheres to a specified property or return a concrete input (witness) demonstrating a violation of the property [7].

Robustness and fairness are two important properties of neural networks. Robustness refers to the neural network’s ability to make accurate predictions even in the presence of input perturbations. In particular, a robust neural network is able to produce accurate results without being overly sensitive to small changes in the input. Fairness, on the other hand, refers to a neural network’s ability to make unbiased and equitable predictions, particularly in cases where the input data may contain sensitive attributes such as gender, race, or age. A neural network that is not fair may produce biased results that discriminate against certain groups, which can have serious ethical and social implications.

Local Robustness and Fairness provide the dominant perspective in verification and adversarial testing of DNNs. Local robustness [8,9,10,11] intuitively requires the DNN to have the following property with respect to a given input x – it has to make the same prediction for the input x as for all the points in the vicinity of x. Local fairness [8, 12, 13] is defined in a similar way, with the distance metric used for the inputs being the main difference. Both properties can be formalized as safety properties. This has led to the design of a variety of SMT-based techniques [10, 14, 15], which encode the neural networks and the property to be verified as an SMT solving problem in order to enable automated verification. Other works approach the verification problem using static analysis [16,17,18,19] which over-approximates DNN executions, thereby compromising precision for higher scalability. Alternative verification techniques include mixed-integer programming [20,21,22] and modified simplex algorithms [9, 23].

Figure 1 (a) illustrates the properties of local robustness. It shows the classification of an input \(\vec {x}\) that includes two (continuous) features \(x_1\) and \(x_2\). The pair of purple points is not a counterexample to robustness, as both inputs lie within the same class. The green and blue points, however, represent counterexamples to local robustness, as they fall on different sides of the decision boundaries.

The above example shows two major limitations of local properties. First, there are always inputs arbitrarily close to the decision boundary, which then constitute counterexamples to local robustness. Second, local robustness is defined only for a specific input. Consequently, it does not provide any guarantees for any other input. It follows that the robustness of the entire neural network cannot be assessed with local robustness only.

Fig. 1.
figure 1

(a) Local, (b) partitioned-global and (c) our confidence-based robustness. \(x_1\) and \(x_2\) denote continuous input points, while \(x_3\) denotes a categorical input in the partitioned-global approach (b). The shades of gray in (c) depict the level of confidence of the neural network with respect to the given inputs – dark gray denotes high while white denotes low confidence level. The neural network is robust to the pair of purple points in all three cases (a), (b) and (c). The neural network is not robust for the pair of blue points in the case of local and partitioned-global (b) robustness, but is robust according to our definition (c). Finally, the neural network is not robust for the pair of green points according to both the local and our confidence-based global robustness (a) and (c), but is robust with respect to the partitioned-global robustness (b). The global partitioning method does not catch the counterexample, because the two green points are in separate partitions. (Color figure online)

Global Robustness and Fairness. The limitation of the local definition for robustness and fairness indicates the need for a global property that evaluates the expected input/output relation over all pairs of inputs.

We first observe that global robustness and fairness of DNNs are hyperproperties, i.e. properties that relate multiple executions of the model. Khedr et al. [24] and Biswas et al. [25] recently introduced the first verification techniques for hyperproperties in DNNs. These works assume that the inputs contain categorical variables. Based on this strong assumption, these two approaches partition the input space based on categorical features to avoid comparing inputs close to decision boundaries, which would lead to a non-satisfiable property. This is illustrated in Fig. 1 (b). Here, we assume that \(\vec {x}\) includes a categorical feature \(x_3\) in addition to the continuous features \(x_1\) and \(x_2\). The left (right) part of Fig. 1 (b) depicts classes and inputs in the partition based on the categorical feature \(x_3\) with value v (\(v'\)). Consequently, only pairs of inputs belonging to the same partition are compared. Inputs belonging to two different partitions (e.g. green points in Fig. 1 (b)) deviate in at least one categorical feature and can hence be assumed to violate the premise that these inputs are “close”. According to this approach, a classification in a secure network can only change with different categorical values. Any two points that lie in the same partition but belong to different classes (e.g. the pair of blue points in Fig. 1 (b)) are considered counterexamples to the global property. This leads to a strong limitation that does not admit two classes to result from continuous inputs only, as typically required for robustness. As a result, the two approaches [24, 25] address only verification of global fairness.

Our Contributions. Inspired by the work of Chen et al. [26] on properties of rule-based security classifiers, we adopt a confidence-based view on global robustness and fairness for DNN. The idea is to compare all input pairs which are (1) sufficiently close and (2) for which at least one of them yields a high confidence classification. This intuitive definition expects robust and fair DNNs to generate outputs with low confidence near the decision boundary.

We therefore propose confidence-based 2-safety property, the first definition that unifies global robustness and fairness for DNNs. Our definition highlights the hyperproperty nature of global properties and uses the confidence in the DNN as a first class citizen.

We briefly illustrate the intuition behind our confidence-based 2-safety property definition with focus on robustness in Fig. 1 (c), in which the input space is colored into shades of gray and where every gray value corresponds to a confidence of the network. Darker shades of gray represent higher levels of confidence for the given classification. Our definition captures two reasonable assumptions: (1) continuous inputs can also trigger changes in classification, and (2) the confidence of the neural network at a decision boundary must be relatively low. In essence, our definition requires that for any input with high-confidence, all its \(\epsilon \)-neighbour inputs yield the same class (e.g. the two purple points in Fig. 1 (c)). This notion discards inputs near the decision boundaries as counterexamples, as long as they result in outputs with low confidence (e.g. the two blue points in Fig. 1 (c)). Systems satisfying the 2-safety properties hence guarantee that input points classified with a high confidence are immune to adversarial perturbation attacks. In Fig. 1 (c), the pair of green inputs witness the violation of the confidence-based 2-safety property – the two points lie in different classes and one of them has an output with high confidence.

This confidence-based view makes a conceptual change to the definition of global properties, as it requires relating not only inputs, but also confidence values to the outputs. This conceptual change poses a significant challenge to the verification problem because checking a confidence-based property on a DNN requires reasoning about its softmax layer, which is not supported by the state-of-the-art DNN verification tools [23, 27,28,29,30,31]. To solve this problem, we develop the first verification method that supports DNNs with softmax, in which we use a linearized over-approximation of the softmax function. We then combine it with self-composition [32] in order to verify confidence-based 2-safety properties. We formally prove the soundness of our analysis technique, characterizing, in particular, the error bounds of our softmax over-approximation.

We demonstrate our approachFootnote 1in Marabou [23], a state-of-the-art analysis tool for local robustness based on a modified simplex algorithm, which we extend to support global robustness and global fairness. We show that by combining our method with binary search, we can go beyond verification and synthesize the minimum confidence for which the DNN is globally robust or fair. We finally conduct a performance evaluation on four neural networks trained with publicly available datasets to demonstrate the effectiveness of our approach in identifying counterexamples and proving global robustness and fairness properties.

2 Background

2.1 Feed-Forward Neural Networks

In feed-forward neural networks data flows uni-directionally, which means there are no back edges. An input layer receives the inputs that move via one or multiple hidden layers to the output layer [33]. A layer has multiple neurons, each connected to the neurons in the next layer using a set of weights. Each layer also has an associated bias. Weight and bias selection is crucial to the performance of a neural network and is performed during the training phase. Outputs are calculated by processing the inputs using weights and biases, followed by applying the activation functions and then propagating the processed inputs through the network [34].

Formally, a feed-forward neural network \(f: \mathbb {R}^m \rightarrow \mathbb {R}^n\) is modeled as a directed acyclic graph \(G=(V,E)\) that consists of a set (finite) of nodes V and a set of edges \(E\subseteq V\times V\).Footnote 2 The nodes V are partitioned into l layers \(V^i\) with \(1\le i\le l\), where \(V^1\) and \(V^l\) represent the input and output layers, and \(V^2,\ldots , V^{l-1}\) represent the hidden layers, respectively. We use \(v_{i,j}\) to denote node j in layer i. The edges E connect nodes in \(V^{i-1}\) with their successor nodes in \(V^i\) (for \(1< i \le l\)).

Each node \(v_{i,j}\) has an input and an output, where the latter is derived from the former by means of an activation function. We use \(\textrm{in} (v_{i,j})\) and \(\textrm{out} (v_{i,j})\) to denote the input and output value of node \(v_{i,j}\), respectively. The output is determined by

$$\begin{aligned} \textrm{out} (v_{i,j}) = a _{i,j}(\textrm{in} (v_{i,j})) )\,, \end{aligned}$$
(1)

where \(a _{i,j}\) is the activation function. The input to node \(v_{i,j}\) in layer \(V^i\) is determined by the outputs of its predecessors \(v_{i-1,1},\ldots ,v_{i-1,k}\) in \(V^{i-1}\) and weights associated with the edges \((v_{i-1,k},v_{i,j})\in E\) for \(1\le k\le \vert V^{i-1}\vert \):

$$\begin{aligned} \textrm{in} (v_{i,j})=\sum _{k=1}^{\vert V^{i-1}\vert } \textrm{weight} ((v_{i-1,k},v_{i,j})) \cdot \textrm{out} (v_{i-1,k}) \end{aligned}$$

The values of the nodes in the input layer \(V^1\) are determined by the input \(\vec {x}\) to \(f(\vec {x})\), i.e.,

$$\begin{aligned} (\textrm{in} (v_{1,1}),\ldots ,\textrm{in} (v_{1,m}))=\vec {x}\,. \end{aligned}$$

The output of the final layer \(V^l\) is then computed by propagating the inputs according to the activation functions (see Eq. 1 above). Consequently, a graph G with \(\vert V^1\vert =m\) input and \(\vert V^l\vert =n\) output nodes induces a function \(f:\mathbb {R}^m \rightarrow \mathbb {R}^n\) whose semantics is determined by the activation functions.

In this paper, we concentrate on the Rectified Linear Unit (\(\textrm{ReLU}\)) activation function, which is frequently applied to the hidden layers of deep neural networks. For a (scalar) input value x, \(\textrm{ReLU}\) returns the maximum of 0 and x, i.e.

$$\begin{aligned} \textrm{ReLU} (x)= \max (0, x)\,. \end{aligned}$$

In neural networks that are used as classifiers and map an input \(\vec x\) to one of m labels in a set of classes C, the final layer typically employs a \(\textrm{softmax}\) function to ensure that the output represents normalized probabilities corresponding to each of the n classes. Mathematically,

$$\begin{aligned} \textrm{softmax} (\vec {z})_i = \frac{e^{z_{\textrm{i}}}}{\sum _{j=1}^{n} e^{z_\textrm{j}}} \end{aligned}$$
(2)

where \(\vec {z}\) represents the values \(\textrm{out} (v_{l-1,i})\) for \(1 \le i \le n\) and \(n=\vert V^{l-1}\vert \), and \(z_i\) is the \(i^{\text {th}}\) element in \(\vec {z}\). This induces a function \(y: \mathbb {R}^n \rightarrow [0,1]^n\) mapping every output of \(V^{l-1}\) to a confidence score in the range [0, 1]. Consequently, \(f(\vec {x})\) outputs a probability distribution over the possible labels in C, where each component of the output vector represents the probability of input \(\vec {x}\) belonging to the corresponding class. We use \(\textrm{conf} (f(\vec x))\) to refer to the highest probability value in the \(\textrm{softmax}\) layer of \(f(\vec x)\) and call it the confidence, i.e.,

$$\begin{aligned} \textrm{conf} (f(\vec {x}))=\max (\textrm{out} (v_{l,1}),\ldots ,\textrm{out} (v_{l,n})) \end{aligned}$$
(3)

Finally, a function \(\textrm{class}: \mathbb {R}^m \rightarrow C\) then maps the output of f to the class C corresponding to the highest probability in \(f(\vec x)\):

$$\begin{aligned} \textrm{class} (f(\vec {x})) = \mathop {\mathrm {arg\,max}}\limits _{1\le i\le n}(\textrm{out} (v_{l,i})) \end{aligned}$$
(4)

2.2 Hyperproperties

Hyperproperties [36] are a class of properties that capture relationships between multiple execution traces. This is in contrast to traditional properties, which are evaluated over individual traces.

To define traces in the context of feed-forward neural networks, we extend our notation \(\textrm{out} \) to layers as follows:

$$\begin{aligned} \textrm{out} (V^i)=(\textrm{out} (v_{i,1}),\ldots ,\textrm{out} (v_{i,k})) \end{aligned}$$

where \(k=\vert V^i\vert \). Let \(\textrm{in} (V^i)\) be defined similarly. The corresponding trace \(\pi \) for \(f(\vec x)\) is then formally defined as

$$\begin{aligned} \pi = \textrm{in} (V^1),\textrm{out} (V^1),\ldots ,\textrm{in} (V^l),\textrm{out} (V^l) \end{aligned}$$

where \(\textrm{in} (V^1)=\vec {x}\).

Note that each execution is entirely determined by the input value \(\vec x\) (assuming that the function f implemented by the network is deterministic). Quantifying over traces \(\pi \) of \(f(\vec {x})\) hence corresponds to quantifying over the corresponding inputs \(\vec {x}\). A traditional safety property would then quantify over the inputs \(\vec {x}\), e.g.

$$\begin{aligned} \forall \vec x \,.\,\textrm{conf} (f(\vec x))\ge \kappa , \end{aligned}$$

stating that the confidence of each classification of the network should be larger than a threshold \(\kappa \). Another example of a traditional safety property is local robustness, given in Definition 1 in Subsect. 2.4.

A hyperproperty, on the other hand, refers to, and quantifies over, more than one trace. An example would be

$$\begin{aligned} \forall \vec {x},\vec {x}' . \frac{\vert f(\vec {x})_i-f(\vec {x}')_i\vert }{||\vec {x}-\vec {x}'||}\le K_i , 1\le i \le n \end{aligned}$$
(5)

where \(f(\vec {x})_i\) denotes \(\textrm{out} (v_{l,i})\). Equation 5 states that \(K_i\) sets the maximum limit of the Lipschitz constant for \(f(\vec {x})_i\). A hyperproperty central to this paper is global robustness, defined in Definition 2 Subsect. 2.4.

Hyperproperties are used to capture important properties that involve multiple inputs, such as robustness and fairness. By verifying hyperproperties of neural networks, we can ensure that they behave correctly across all possible input traces.

2.3 Relational Verification and Self-composition

Hyperproperties are verified by means of so-called relational verification techniques: the idea is to verify if k program executions satisfy a given property [37], expressing invariants on inputs and outputs of such executions. Several security properties (e.g., information flow) can be expressed by relating two executions of the same program differing in the inputs: such properties are called 2-safety properties. Global robustness in neural networks can also be seen as a 2-safety property [8].

2-safety properties can be verified in a generic way by self-composition [37]: the idea is to compose the program with itself and to relate the two executions. In the context of neural networks, the self-composition of a network f is readily defined as a function over

$$\begin{aligned} f(\vec {x})\times f(\vec {x}') = \lambda (\vec {x},\vec {x}')\,.\,(f(\vec {x}),f(\vec {x}')) \end{aligned}$$
(6)

where \((\vec {x},\vec {x}')\) denotes the concatenation of the vectors \(\vec {x}\) and \(\vec {x'}\) and \(\lambda \vec {x}\,.\,f(\vec x)\) denotes the lambda term that binds \(\vec {x}\) in \(f(\vec {x})\). The underlying graph \(G=(V,E)\) is simply duplicated, i.e., we obtain a graph \(G\times G'=(V\cup V', E\cup E')\) where \(V'\) and \(E'\) are primed copies of V and E.

A counterexample to a universal 2-safety property over the self-composition of f comprises of a pair of traces of f witnessing the property violation.

2.4 Robustness and Fairness

Robustness in neural networks refers to the ability of a model to perform consistently in the presence of small perturbations of the input data. The common approach to address robustness in neural networks is to define it as a local robustness [38] property. For an input \(\vec x\), a neural network is locally robust if it yields the same classification for \(\vec x\) and all inputs \(\vec x'\) within distance \(\epsilon \) from \(\vec x\) [39]:

Definition 1

(Local Robustness). A model f is locally \(\epsilon \)-robust at point \(\vec x\) if

$$\forall \vec {x}'\,.\, ||\vec {x}-\vec {x}'|| \le \epsilon \rightarrow \textrm{class} (f(\vec {x})) = \textrm{class} (f(\vec {x}'))$$

Local robustness, therefore, is defined only for inputs within a distance \(\epsilon \) of a specific \(\vec x\) and, thus, does not provide global guarantees. Here \(||\cdot ||\) represents the distance metric used over the input space. Intuitively, global robustness tackles this problem by requiring that the local robustness property must hold for every input within the input space [8]. Definition 2 gives the general definition of global robustness used in [26, 40]. It essentially states that all input points in a small neighborhood \(\epsilon \), are mapped to the same class.

Definition 2

(Global robustness). A model f is globally \(\epsilon \)-robust if

$$\begin{aligned} \forall \vec x, \vec x'\,.\, ||\vec x - \vec x'|| \le \epsilon \rightarrow \textrm{class} (f(\vec x)) = \textrm{class} (f(\vec x'))\, \end{aligned}$$

Clearly, global robustness as formalized in Definition 2 makes sense only for selected distance metrics, which in particular avoid comparing inputs close to the decision borders. For instance, [40] addresses this by introducing an additional class \(\bot \) to which \(\textrm{class} (f(\vec x))\) evaluates whenever the difference between the highest and second-highest probability falls below a certain threshold (determined by the Lipschitz constants of f). The global robustness requirement is then relaxed at these points.

Definition 3 (Global fairness)

[Global fairness] A model is said to be globally fair if:

$$\begin{aligned} & \forall \vec {x} = (x_s, \vec {x_n}), \vec {x'} = (x_s', \vec {x_n'}). \, \\ & \qquad \qquad \qquad \quad \quad \,\,\,\, ||\vec {x_n} - \vec {x'_n}|| \le \epsilon \; \wedge (x_s \ne x_s') \rightarrow \textrm{class} (f(\vec x)) = \textrm{class} (f(\vec x')) \end{aligned}$$

where \(x_s\) and \(x_n\) are the sensitive and non-sensitive attributes of \(\vec {x}\), respectively.

[24, 25] address a similar problem, which arises in the context of fairness, by partitioning the input space based on categorical features. In general, if the input to a decision-making neural network comprises of certain sensitive attributes, say age or gender, the network is said to be fair if the sensitive attributes do not influence its decisions [8]. Definition 3 gives the general definition of global fairness used in [24, 25].

Ensuring fairness in neural networks is important because these models are increasingly being used in decision-making processes that can have significant impacts on peoples’ lives. For example, a hiring algorithm that discriminates against certain groups of job applicants based on their race or gender could perpetuate existing biases and inequalities in the workplace [41].

3 Confidence Based Global Verification of Feed-Forward Neural Networks

We now formalize confidence-based 2-safety property, the first definition that unifies global robustness and fairness for DNNs in Definition 4. It is a hyperproperty that takes the confidence of the decision into account when checking for the safety of the network. Before we give the actual definition, we introduce additional notation. Given an input \(\vec {x} = (x_1, \ldots , x_n)\), we assume that its every component \(x_i\) is either a categorical or real value. We then define the distance \(d(x_i, x'_i)\) as \(|x_i - x'_i|\) when \(x_i\) is real-valued. We use instead the following distance:

$$ d(x_i, x'_i) = {\left\{ \begin{array}{ll} 0, &{} \text {if}\ x_i = x'_i \\ 1, &{} \text {otherwise} \end{array}\right. } $$

when \(x_i\) is a categorical value. We define \(cond (\vec {x}, \vec {x}', \vec {\epsilon })\) as a (generic) Boolean condition that relates inputs \(\vec {x}\) and \(\vec {x}'\) to a tolerance vector \(\vec {\epsilon }\).

Definition 4

(Confidence-based global 2-safety). A model f is said to be globally 2-safe for confidence \(\kappa > 0\) and tolerance \(\vec {\epsilon }\) iff

$$\begin{aligned} \begin{aligned} \forall \vec {x}, \vec {x}'\,.\, \textrm{cond} (\vec {x}, \vec {x}', \vec {\epsilon })\; & \wedge \textrm{conf} (f(\vec {x})) > \kappa & \implies \textrm{class} (f(\vec {x})) = \textrm{class} (f(\vec {x}')) \end{aligned} \end{aligned}$$

Next, we instantiate the above 2-safety property for confidence-based global robustness and fairness.

For confidence-based global robustness, \(\textrm{cond}\) is defined as:

$$\textrm{cond} (\vec {x}, \vec {x}', \vec {\epsilon }) = \bigwedge _{ i \in [1,n]} d(x_i, x'_i) \le \epsilon _i$$

For confidence-based global fairness, \(\vec {x}\) can be split into sensitive \(\vec {x_s}\) and non-sensitive \(\vec {x_n}\) attributes. For confidence-based global fairness,

$$\begin{aligned} \begin{aligned} \textrm{cond} (\vec {x}, \vec {x}', \vec {\epsilon }) = \bigwedge _{x_i \in \vec {x_s}} d(x_i, x'_i) > 0 \; \wedge \bigwedge _{x_i \in \vec {x_n}} d(x_i, x'_i) \le \epsilon _i \end{aligned} \end{aligned}$$

where for any categorical \(x_i \in \vec {x_n}\), its associated tolerance threshold \(\epsilon _i=0.5\).

Intuitively, confidence-based global fairness ensures that for any data instance, x classified with high confidence \(\kappa \), no other data instance, \(x'\), that only differs with x in the value of the sensitive attribute (e.g. age, gender, ethnicity) shall be classified to a different class.

As defined in Sect. 2, f(x) represents the feed-forward neural network, which maps inputs to classes with corresponding confidence scores. By introducing the threshold \(\kappa \), our definition effectively ignores classification mismatches that arise from decisions with low confidence. The rationale is as follows:

  • Different classifications close to decision boundaries need to be allowed, as safety can otherwise only be satisfied by degenerate neural networks that map all inputs to a single label.

  • On the other hand, input points classified with a high confidence should be immune to adversarial perturbations and also uphold fairness.

3.1 Encoding 2-Safety Properties as Product Neural Network

In this section, we reduce checking of the 2-safety hyperproperty in Definition 4 to a safety property over a single trace. Given a neural network f (as defined earlier in Sect. 2), the product neural network is formed by composing a copy of the original neural network with itself. Checking 2-safety then reduces to checking an ordinary safety property for the self-composed neural network that consists of two copies of the original neural network, each with its own copy of the variables.

The product neural network is now treated as the model to be verified. A product network allows the reduction of a hyperproperty to a trace property, thereby reducing the problem of hyperproperty verification to a standard verification problem, which can be solved using an existing standard verification technique.

Table 1. Marabou’s piecewise linear constraints

Product Neural Network. We encode \(f(\vec {x})\) using piecewise linear constraints (see Table 1). Each node \(v_{i,j}\) is represented by two variables \(\textsf{in}_{i,j}\) and \(\textsf{out}_{i,j}\) representing its input and output, respectively. Inputs and outputs are related by the following constraints:

$$\begin{aligned} \textsf{in}_{i,j}=\sum _{k=1}^{\vert V^{i-1}\vert } w_{i,j}^{i-1,k} \cdot \textsf{out}_{(i-1,)k}\quad \wedge \quad \textsf{out}_{i,j} = a_{i,j}(\textsf{in}_{i,j}) \end{aligned}$$

where \(w_{i,j}^{i-1,k}\) is the weight associated with the edge \((v_{i-1,k},v_{i,j})\) and \(a_{i,j}\) is the activation function of node \(v_{i,j}\). To encode the self-composition, we duplicate all variables and constraints by introducing primed counterparts \(\textsf{in}'_{i,j}\) and \(\textsf{out}'_{i,j}\) for \(\textsf{in}_{i,j}\) and \(\textsf{out}_{i,j}\).

Transfer Functions and Operators.\(\textrm{ReLU}\) s can be readily encoded using \(\textsf{out}_{i,j}=\textrm{ReLU} (\textsf{in}_{i,j})\). There is, however, no direct way to encode \(\textrm{softmax}\) using the constraints in Table 1, hence we defer the discussion to Subsect. 3.2.

The \(\textrm{conf} \) operator can be implemented using the \(\max \) constraint (cf. Eq. 3). The operator \(\textrm{class} \) as well as the implication, on the other hand, are not necessarily supported by state-of-the-art static analysis tools for DNNs. For instance, they are not supported by Marabou [23], on which we base our implementation. For reference, Table 1 illustrates the linear constraints supported by Marabou. We thus introduce an encoding, which we detail below.

First, checking the validity of the implication in Definition 4 can be reduced to checking the unsatisfiability of

$$\begin{aligned} \textrm{cond} (\vec {x}, \vec {x'}, \vec {\epsilon }) \wedge \textrm{conf} (f(\vec {x})) > \kappa \wedge \textrm{class} (f(\vec {x})) \ne \textrm{class} (f(\vec {x}')) \end{aligned}$$
(7)

However, the grammar in Table 1 provides no means to encode disequality or \(\textrm{class} \) (which returns the index of the largest element of a vector). To implement disequality, we perform a case split over all \(n=\vert V^l\vert \) labels by instantiating the encoding of the entire network over \(\textsf{out}_{l,i}\) and \(\textsf{out}'_{l,i}\) for \(1\le i\le n\). To implement this in Marabou, we execute a separate query for every case.

To handle the operator \(\textrm{class}\), we can encode the disequality \(\textrm{class} (f(\vec {x})) \ne \textrm{class} (f(\vec {x}')\) as:

$$\begin{aligned} & \ldots \;\wedge \;\overbrace{\max (\textsf{out}_{l,1},\ldots ,\textsf{out}_{l,n})}^{\textrm{conf} (f(\vec {x}))} > \kappa \; \wedge \left( \max (\textsf{out}_{l,1},\ldots ,\textsf{out}_{l,n})-\textsf{out}_{l,i} = 0\right) \; \wedge \nonumber \\ & \qquad \qquad \qquad \qquad \qquad \qquad \qquad \,\, \left( \max (\mathsf {out'}_{l,1},\ldots ,\mathsf {out'}_{l,n})-\mathsf {out'}_{l,i} \ne 0\right) \; \end{aligned}$$
(8)

The constraint \((\max (\textsf{out}_{l,1},\ldots ,\textsf{out}_{l,n})-\textsf{out}_{l,i}=0)\) ensures that \(\textrm{out} _{l,i}\) corresponds to the largest element in \(f(\vec {x})\) (and hence that \(\textrm{class} (f(\vec {x})=i\)). Consequently, if \((\max (\mathsf {out'}_{l,1},\ldots ,\mathsf {out'}_{l,n})-\mathsf {out'}_{l,i} \ne 0)\), then we can conclude that \(\textrm{class} (f(\vec {x}'))\ne i\) and hence the safety constraint is violated.

Since, Marabou does not support the disequality operator, we check whether \((\max (\mathsf {out'}_{l,1},\ldots ,\mathsf {out'}_{l,n})-\mathsf {out'}_{l,i} < 0 ) \) and \((\max (\mathsf {out'}_{l,1},\ldots ,\mathsf {out'}_{l,n})-\mathsf {out'}_{l,i} > 0 ) \), and if both constraints are not satisfied, we know that \((\max (\mathsf {out'}_{l,1},\ldots ,\mathsf {out'}_{l,n})-\mathsf {out'}_{l,i} \ne 0 )\).

While the above transformation is equivalence preserving, the encoding of \(\textrm{softmax}\) requires an approximation, described in the following subsection.

3.2 Softmax Approximation

Softmax in Terms of Max and Sig. We can approximate softmax using a \(\max \) operator and a sigmoid function as follows. Consider \({\textrm{softmax} (\vec {z})_i}\) (cf. Eq. 2), for i = 1,

$$\begin{aligned} {\textrm{softmax} (\vec {z})_1} &= \frac{1}{1+ (e^{z{_{\tiny 2}}} + \cdots + e^{z{_{\tiny n}}})e^{-z{_{\tiny 1}}} } \end{aligned}$$
(9)
$$\begin{aligned} &= \frac{1}{1+ e^{log (e^{z{_{\tiny 2}}} + \cdots + e^{z{_{\tiny n}}})}e^{-z{_{\tiny 1}}}} = \frac{1}{1+ e^{(-z{_{\tiny 1}} + log (e^{z{_{\tiny 2}}} + \cdots + e^{z{_{\tiny n}}}))}} \end{aligned}$$
(10)

We can now generalize the result from (9) for i

$$\begin{aligned} \begin{aligned} &\textrm{softmax} (\vec {z})_\textrm{i} = \frac{1}{1+ e^{(-z{_{\tiny \textrm{i}}} + log (\sum _{j\ne i}^{n} e^{z{_{\tiny \textrm{j}}}}))}} =\textrm{Sig} (z{_{\tiny \textrm{i}}} - \mathop {\mathop {\textrm{LSE}}\limits _{1}^{n}}_{j\ne i}(z_j)) \end{aligned} \end{aligned}$$
(11)

where LSE (the log-sum-exp) is:

$$\mathop {\mathop {\textrm{LSE}}\limits _{1}^{n}}_{j\ne i}(z_j) = log (\sum _{j=1, j\ne i}^{n} e^z_{\tiny {\textrm{j}}}) \text{ and } \textrm{Sig} (x) = \frac{1}{1+ e^{-x}}$$

We know from [42] that LSE is bounded:

$$\begin{aligned} \begin{aligned} \mathop {max_{1}^{n}} (z_i) \le \mathop {\mathop {\textrm{LSE}}\limits _{1}^{n}} (z_i)\le \mathop {max_{1}^{n}} (z_i) +log(n) \end{aligned} \end{aligned}$$
(12)

with \(\mathop {max_{1}^{n}} (z_i)=max(z_1, \cdots , z_n)\), in particlular, when \(z_1=\cdots =z_n\), we have:

$$\begin{aligned} \begin{aligned} \mathop {\mathop {\textrm{LSE}}\limits _{1}^{n}} (z_i) = \mathop {max_{1}^{n}} (z_i) +log(n) \end{aligned} \end{aligned}$$
(13)

Then softmax has as lower bound:

$$\begin{aligned} \begin{aligned} \textrm{softmax} (\vec {z})_\textrm{i} \ge \textrm{Sig} (z{_{\tiny \textrm{i}}} - \mathop {max_{1}^{n}}_{j\ne i}(z_j) + log(n-1)) \end{aligned} \end{aligned}$$
(14)

and as upper bound:

$$\begin{aligned} \begin{aligned} \textrm{softmax} (\vec {z})_\textrm{i} \le \textrm{Sig} (z{_{\tiny \textrm{i}}} - \mathop {max_{1}^{n}}_{j\ne i}(z_j) ) \end{aligned} \end{aligned}$$
(15)

When \(n=2\), the softmax is equivalent to the sigmoid:

$$\begin{aligned} \begin{aligned} \textrm{softmax} (\vec {z})_1 = \textrm{Sig} (z{_{\tiny \textrm{1}}} - z_2) \text{ and } \textrm{softmax} (\vec {z})_ 2 = \textrm{Sig} (z{_{\tiny 2}} - z_1) \end{aligned} \end{aligned}$$
(16)

Now that we know how to approximate a softmax using a sigmoid and max, we need to find a piece-wise linear approximation of sigmoid since sigmoid is also a non-linear exponential function.

Piece-Wise Approximation of Sigmoid. We approximate sigmoid as a piece-wise linear function using the Remez exchange algorithm [43]. The Remez algorithm is an iterative algorithm that finds simpler approximations to functions. It aims to minimize the maximum absolute difference between the approximated polynomial and the actual function. The algorithm takes a maximum acceptable error \(\delta \) and generates l linear segments to approximate the sigmoid function such that the error is less than \(\delta \). We use the Remez algorithm to approximate the sigmoid in the interval \([\textrm{Sig} ^{-1}(\delta ),\textrm{Sig} ^{-1}(1-\delta )]\), where \(\textrm{Sig} ^{-1}\) is the inverse function of the sigmoid. The inverse of sigmoid is the logit function i.e., \(\textrm{Sig} ^{-1}(y)=logit(y)=log (y)/(1-y)\). For example, if the user sets \(\delta \) to 0.0006, then the input domain for the algorithm lies in \([-7.423034723582278, 7.423034723582278]\). Thus, the approximated sigmoid is:

figure a

We approximate \(\textrm{softmax} \) with its lower bound:

(17)

and the upper bound for the softmax is:

$$\begin{aligned} \begin{aligned} \widehat{\textrm{softmax}} (\vec {z})_\textrm{i} \le \textrm{softmax} (\vec {z})_\textrm{i} \le \mathrm {\widehat{Sig}} (z{_{\tiny \textrm{i}}} - \mathop {max_{1}^{n}}_{j\ne i}(z_j)) + \delta \end{aligned} \end{aligned}$$
(18)

Theorem 1

Let \(\textrm{softmax} \) and \(\widehat{\textrm{softmax}}\) compute the real and linearly approximated softmax (with precision \(\delta \)), respectively for the last layer of \(n \ge 2\) neurons \(\vec {z}\) of a neural network and \(z_i = max(z_1, \cdots , z_n)\). Then, we have the following result:

$$\begin{aligned} \forall \vec {z} . \ \textrm{softmax} (\vec {z})_{i} - \widehat{\textrm{softmax}}(\vec {z}) _{i} \le \frac{n-2}{(\sqrt{n-1} + 1)^2} + 2 \delta \end{aligned}$$

Proof

We refer to [44].    \(\square \)

Theorem 2

(Class consistency) Let f and \(\hat{f}\) denote the real and the approximated (with precision \(\delta \)) neural networks with \(n\ge 2\) outputs, respectively. Then:

$$\textrm{conf} (\hat{f}(\vec {x})) > \frac{1}{2} \implies \textrm{class} (\hat{f}(\vec {x})) = \textrm{class} (f(\vec {x})) $$

Proof

We refer to [44].    \(\square \)

Soundness. For the confidence-based 2-safety property discussed before, our analysis provides a soundness guarantee. This means that whenever the analysis reports that the property specified in Definition 4 holds, then the property also holds true in the concrete execution.

Theorem 3

(Soundness) Let f and \(\hat{f}\) be the original neural network and over-approximated neural network, respectively. Let \(b_{n,\delta }\) be the error bound of the approximated softmax (\(b_{n,\delta } =\frac{n-2}{(\sqrt{n-1} + 1)^2} + 2 \delta \) (see Theorem 1)). Then we have the following soundness guarantee: Whenever the approximated neural network is 2-safe for \(\textrm{conf} (\hat{f}(\vec {x})) > (\kappa - b_{n,\delta })\), the real neural network is 2-safe for \(\textrm{conf} (f(\vec {x})) > \kappa \), given \(\textrm{conf} (\hat{f}(\vec {x})) > \frac{1}{2}\). Formally:

$$\begin{aligned} &\left( \begin{aligned} \forall \vec {x}, \vec {x'}. \ \textrm{cond} (\vec {x}, \vec {x'}, \vec {\epsilon }) \wedge {\textrm{conf}}(\hat{f}(\vec {x})) > (\kappa - b_{n,\delta }) \\ \implies {\textrm{class}}(\hat{f}(\vec {x})) = {\textrm{class}}(\hat{f}(\vec {x'})) \end{aligned}\right) \implies \\ & \qquad \quad \left( \begin{aligned}\forall \vec {x}, \vec {x'}. \ \textrm{cond} (\vec {x}, \vec {x'}, \vec {\epsilon }) \wedge \textrm{conf} (\vec {f(x)}) > \kappa \\ \implies \textrm{class} (f(\vec {x})) = \textrm{class} (f(\vec {x'}))\end{aligned}\right) , \text{ with } \, \textrm{conf} (\hat{f}(\vec {x})) > \frac{1}{2} \end{aligned}$$

Proof

We refer to [44].    \(\square \)

4 Implementation

For the implementation of our technique, we use the state-of-the-art neural network verification tool Marabou [23] as our solver. In this section, we describe how we encode the confidence-based 2-safety property in Marabou. Note that such an encoding can be expressed in a similar way for virtually any off-the-shelf neural network verifier.

Marabou [23]. Marabou is a simplex-based linear programming neural network verification and analysis tool. Marabou is capable to address queries about network’s properties, such as local robustness, by encoding them into constraint satisfaction problems. It supports fully-connected feed-forward neural networks. A network can be encoded as a set of linear constraints representing the weighted sum of the neurons’ outputs feeding the next neuron’s input, and a set of non-linear constraints defining the activation functions. A verification query to Marabou comprises a neural network along with a property that needs to be verified. This property is defined as “linear and nonlinear constraints on the network’s inputs and outputs” [23]. In Marabou, network’s neurons are treated as variables. As a result, the verification problem involves identifying a variable assignment that satisfies all the constraints at the same time, or establishing that such an assignment does not exist. The tool uses a variant of the Simplex algorithm at its core to make the variable assignment satisfy the linear constraints. During the execution, the tool adjusts the variable assignment to either fix a linear or a non-linear constraint violation. Although the technique implemented in Marabou is sound and complete, the tool can work only with piece-wise linear activation functions (including \(\textrm{ReLU} \) function and the \(\max \) function) to guarantee termination. Additionally, an essential aspect of Marabou’s verification approach is deduction – specifically, deriving more precise lower and upper bounds for each variable. The tool leverages these bounds to relax piece-wise linear constraints into linear ones by considering one of its segments.

The original network g is a function of the following: input parameters, neurons, neuron connection weights, layer biases, \(\textrm{ReLU} \) activation functions and output classes. To make Marabou amenable for verification of 2-safety properties, we need a product neural network. This means that the execution is tracked over two copies of the original network, g and \(g'\) (cf. Subsect. 2.3). Let \(X_i\) denote the set of input variables to g and let \(X'_i\) be a set of primed copies of the variables in \(X_i\). As a result, we obtain a self-composition \(g(X_i) \times g'(X_i')\) of g over the input variables \(X_i \cup X_i'\).

Next, we extend the output layer with softmax function in order to extract the confidence scores with which output classes are predicted.

Linearized Sigmoid. We explain our linearized sigmoid function in this subsection. This function is used to implement an approximated piece-wise linear sigmoid function. Let the outputs of the last inner layer \(l-1\) be represented by \(z_i\) for \(1\le i\le n\), where n is the number of output classes. In Marabou, we first encode the linear piece-wise sigmoid function which we obtain by setting the maximum acceptable error to 0.005. This provides us with a piece-wise linear approximated sigmoid with 35 segments of the form \(q_j = \{ m_j \cdot z_i + c_j, \ | \ LB \le z_i \le UB\}\), where \(z_i\) is the variable representing the output node whose confidence we want to find. We encode each segment as an equation in and represent it using a variable \(q_j\). Next, we need to select the applicable segment corresponding to the value of \(z_i\). Unfortunately, Marabou does not provide a conditional construct. So, we deploy the \(\min \) and \(\max \) functions to emulate if-then-else.

First, we split the sigmoid into two convex pieces \(S_1\) and \(S_2\). Figure 2 illustrates this step using a simplified approximation of sigmoid with 4 linear segments \(q_1\), \(q_2\), \(q_3\), and \(q_4\). The resulting value of \(S_1\) can now be expressed as \(S_1 = \min (\max (0, q_1, q_2), 0.5)\). Similarly, \(S_1 = \max (\min (0.5, q_3, q_4), 1)\). The values 0 and 1 are the minimum and maximum values of the sigmoid function and 0.5 is the value of sigmoid at the splitting point. Second, we combine the convex segments by adding them:

$$S = \min (\max (0, q_1, q_2), 0.5) + \max (\min (0.5, q_3, q_4), 1) - 0.5$$

Note that we have 35 segments instead of four used in our simplified explanation.

Linearized Softmax. The next step consists in implementing the softmax function using the output of the sigmoid function and the \(\max \) function (see Eq. 17). To this end, we find the maximum of all output nodes excluding the current one, and subtract that maximum from the current output value. Finally, we apply the linearized sigmoid (cf. Sect. 4), to obtain the result of softmax.

We repeat the above steps for all output nodes to obtain the softmax values corresponding to all output classes. Finally, we find the maximum value of these softmax outputs, which represents the confidence.

Fig. 2.
figure 2

(Simplified) Approximation of sigmoid with 4 linear segments

5 Experimental Evaluation

For our evaluation, we used four publicly available benchmark datasets to evaluate our technique. We pre-process the datasets to remove null entries, select relevant categorical attributes, and hot-encode them. For each dataset, we train a fully connected feed-forward neural network with up to 50 neurons and ReLU activation functions.

German Credit: The German Credit Risk dataset [45] describes individuals requesting credit from a bank and classified, based on their characteristics, in two categories (“good” or “bad”) of credit risk. The dataset comprises 1000 entries.

Adult: The Adult dataset, also referred to as the “Census Income” dataset, is used to estimate whether a person’s income surpasses $50,000 per year based on census information [46].

COMPAS: COMPAS (“Correctional Offender Management Profiling for Alternative Sanctions”) is a widely-used commercial algorithm that is utilized by judges and parole officers to assess the probability of criminal defendants committing future crimes, also known as recidivism [47].

Law School: The Law School Admissions Council (LSAC) provides a dataset called Law School Admissions, which includes information on approximately 27,000 law students from 1991 to 1997. The dataset tracks the students’ progress through law school, graduation, and bar exams. It uses two types of academic scores (LSAT and GPA) to predict their likelihood of passing the bar exam [48].

We use TensorFlow for training neural networks and the NN verifier Marabou [23] whose implementation is publicly available. The accuracies for the deployed models are as follows: German Credit: 0.71; COMPAS: 0.74; Law: 0.94; Adult: 0.77. In our experiments, adding more layers or nodes per layer did not result in an increased accuracy. We run all our experiments using a single AMD EPYC 7713 64-Core Processor, Ubuntu 22.04 LTS Operating System with 32 GB RAM.

Fig. 3.
figure 3

Input distance vs. confidence for German credit dataset

Fig. 4.
figure 4

Input distance vs. confidence for adult dataset

Fig. 5.
figure 5

Input distance vs. confidence for law school dataset

Fig. 6.
figure 6

Input distance vs. confidence for COMPAS dataset

First, we present our confidence-based global robustness results. We evaluate our implementation on the neural networks trained with the benchmark datasets for various combinations of input distance and confidence values. We aim to find proofs for globally robust neural networks. The plots in Figs. 3, 4, 5 and 6 show our experimental results as scatter plots. Markers denoting ’sat’ correspond to the query resulting in a counter-example. A counter-example here means that for the input distance and confidence values in that query, the inputs are classified into different output classes. The ‘unsat’ markers stand for the query being proved (i.e. the model is robust), which means that for the corresponding input distance and confidence threshold, the inputs are classified to the same output class and the model is globally robust. The color bar on the right side denotes the time taken in seconds to run each query; the time scale goes from deep purple to blue, green and yellow as the time taken increases from 0 to 60 sec. The plot in Fig. 3 depicts the effect of varying input distance and confidence on the German credit benchmark. We ran our query with the confidence-based global robustness property for input distance, \(\epsilon \) and confidence \(\kappa \) values ranging from 0.001 to 1.0 and 0.5 to 0.9, respectively. Observe that for \(\kappa \) values below 0.7, the model is sat i.e. we find counter-examples. However, for confidence values above 0.75, even for larger input distances, the queries result in unsat and a proof that the model is robust above a confidence threshold of 0.75.

The plots in Figs. 4, 5 and 6 show the results for neural networks trained with Adult, Law School, and COMPAS datasets. As can be observed from the scatter plots, these models are robust. For confidence values above 0.5, they are 2-safe and we are successfully able to prove this rather fast in 50 s or less.

Table 2. Global fairness on German credit/COMPAS datasets for various criteria

Next, we present the results for confidence-based global fairness verification, which are shown in Table 2. Each row in the table depicts the verification result for a NN along with the sensitive attribute and confidence threshold considered. If the result is ‘unsat’, it means that the query is proved (i.e. the model is fair). In other words, for the corresponding sensitive attribute and confidence value constraints, the inputs are classified to the same output class and the model is globally fair. On the other hand, ‘sat’ corresponds to the query resulting in a counter-example. A counter-example here means that for the corresponding sensitive attribute and confidence threshold in the query, the inputs are classified into different output classes.

The German credit model is proved to be globally fair for confidence values above 0.5 for sensitive attributes Gender and Age. Running our query with the confidence-based global fairness property for the COMPAS model, with Gender as the sensitive attribute gives counter-examples for all confidence values. Additionally, when Ethnicity is considered as the sensitive attribute while verifying the COMPAS model, we find counter-examples for lower confidence values. However, the model is proved to be globally fair for confidence values above 0.999.

We combined our method with binary search, to synthesize the minimum confidence for which the DNN is globally robust or fair. We perform the binary search, starting with confidence 0.5. If the model is unsat, we are done. Else, we check for confidence mid = (0.5 + 1)/2, and continue in this way till we find the minimum confidence accurate to the nearest 0.05. For instance, binary search combined with our method, on German credit gave us 0.75 (in 45 s) to be the minimum confidence for which the DNN is globally robust.

Our experimental results on 2-safety properties (with regard to both, global robustness and global fairness), clearly point out that taking confidence along with input distance into account is crucial when verifying neural networks.

5.1 Discussion

Soundness. Our proof of soundness guarantees that if our approach yields that a model is robust or fair for a given confidence and input distance, the model is indeed safe. In case of the German Credit model, for instance, the model is indeed globally robust for all input distances when the confidence is at least 0.75. Moreover, we can use binary search to find the minimal confidence value above which a model is robust. Our approach guarantees soundness and when a model is found to be safe, it is indeed safe. However, if a counterexample is found, it may be a false positive. False positives (or spurious counterexamples) may in general stem either from overapproximations of the underlying reachability analysis tool or from our own softmax approximation. In our implementation, the former are not present since Marabou is complete (i.e., it does not have false positives), whereas our softmax approximation yields a confidence error that depends on the number of DNN-outputs, as formalized in Theorem 1 and quantified in its proof. For DNNs with two outputs, such as German Credit, Adult, and Law School, there is no error, whereas for three outputs (COMPAS) the error is \(\sim \) 0.171. Hence, if we want to certify a three-output DNN for confidence x, we run our analysis for confidence \(x-0.171\): if no attack is found, we can certify the network for confidence x (soundness), otherwise we know the counterexample violates the 2-safety property (completeness of Marabou) for confidence in between x and \(x-0.171\) (the possible imprecision is due to our softmax overapproximation). We report counterexamples in this paper on German Credit and on COMPAS: the former are true positives (2-output DNN), whereas the latter are counterexamples for confidence in between 0.999 (the confidence we can certify the network for) and 0.828. We ran the network on the counterexample reported by Marabou and found the real confidence to be 0.969. Hence, the network is for sure fair for a confidence higher than 0.999, unfair for confidence levels lower than 0.969, while we cannot decide it for the confidence levels in the interval between 0.969 and 0.999. This means that on our datasets, our analysis is very accurate.

Threats to Validity. Presuming a high level of confidence as a precondition can make low-confidence networks vacuously safe. However, an accurate but low-confidence network is not desirable in the first place. This concept is known in the literature as miscalibration. The Expected Calibration Error is defined as weighted average over the absolute difference between confidence and accuracy. In scenarios where accurate confidence measures are crucial, the goal is to reduce the Maximum Calibration Error [49] that is the maximum discrepancy between confidence and accuracy.

This is orthogonal to our work and there is an entire field of research [50, 51] aiming at minimizing such calibration errors.

6 Conclusion

We introduce the first automated method to verify 2-safety properties such as global robustness and global fairness in DNNs based on the confidence level. To handle the nonlinear \(\textrm{softmax}\) function computing the confidence, we approximate it with a piece-wise linear function for which we can bound the approximation error. We then compute the self-composition of the DNN with the approximated \(\textrm{softmax}\) and we show how to leverage existing tools such as Marabou to verify 2-safety properties. We prove that our analysis on the approximated network is sound with respect to the original one when the value of confidence is greater than 0.5 in the approximated one. We successfully evaluate our approach on four different DNNs, proving global robustness and global fairness in some cases while finding counterexamples in others.

While we improve over recent verifiers for global properties that are limited to binary classifiers [25], a limitation of our current approach is that we can only handle DNNs with few (two to five) outputs, since the approximation error increases with the number of outputs. We plan to overcome this limitation in future work by devising more accurate abstractions of \(\textrm{softmax}\).

To improve scalability, we will investigate how to refine our approach by integrating pruning strategies, such as those developed in [25], which we intend to refine to fit our static analysis framework.

We also plan to explore more sophisticated and effective verification techniques for 2-safety properties, possibly tailored to specific DNN structures.

Finally, we plan to complement our verification approach with testing techniques to further explore the generated counterexamples.