Advances in verification of ReLU neural networks

We consider the problem of verifying linear properties of neural networks. Despite their success in many classification and prediction tasks, neural networks may return unexpected results for certain inputs. This is highly problematic with respect to the application of neural networks for safety-critical tasks, e.g. in autonomous driving. We provide an overview of algorithmic approaches that aim to provide formal guarantees on the behaviour of neural networks. Moreover, we present new theoretical results with respect to the approximation of ReLU neural networks. On the other hand, we implement a solver for verification of ReLU neural networks which combines mixed integer programming with specialized branching and approximation techniques. To evaluate its performance, we conduct an extensive computational study. For that we use test instances based on the ACAS Xu system and the MNIST handwritten digit data set. The results indicate that our approach is very competitive with others, i.e. it outperforms the solvers of Bunel et al. (in: Bengio, Wallach, Larochelle, Grauman, Cesa-Bianchi, Garnett (eds) Advances in neural information processing systems (NIPS 2018), 2018) and Reluplex (Katz et al. in: Computer aided verification—29th international conference, CAV 2017, Heidelberg, Germany, July 24–28, 2017, Proceedings, 2017). In comparison to the solvers ReluVal (Wang et al. in: 27th USENIX security symposium (USENIX Security 18), USENIX Association, Baltimore, 2018a) and Neurify (Wang et al. in: 32nd Conference on neural information processing systems (NIPS), Montreal, 2018b), the number of necessary branchings is much smaller. Our solver is publicly available and able to solve the verification problem for instances which do not have independent bounds for each input neuron.


Introduction
During the last few years, various approaches have been presented that aim to provide formal guarantees on the behaviour of neural networks. The use of such verification methods may be crucial to enable the secure and certified application of neural networks for safety-critical tasks. Moreover, based on first results of [32], awareness was raised that neural networks are prone to fail on so called adversarial examples. These are created by small perturbations of input samples, such that the changes are (almost) imperceptible to humans. However, these perturbations are often sufficient to make a neural network fail on the input sample. The existence of such adversarial examples can be ruled out by methods of neural network verification. In fact, a closely related line of research termed as robustness certification is focused explicitly on this topic.
In the following section we formally introduce the problem that we regard. In Sect. 3 we provide an overview of related work, and present formulations of the verification problem as MIP in Sect. 4. In the subsequent sections we consider approximation techniques, primal heuristics, and branching methods for verification of neural networks. Extensive computational results on the performance of our solver and others can be found in Sect. 8, and Sect. 9 concludes the paper with some final remarks. Additional material can be found in the appendices. Our solver, which is based on the academic MIP solver SCIP [13], is publicly available at https://github.com/roessig/verify-nn. For the ease of notation, we use [n] for n ∈ N to denote the set {1, . . . , n}. In our work, we only regard trained neural networks, which can be seen as immutable and deterministic functions F : R n → R m . F is determined by its weights and biases ((A l , b l )) L l=1 . It holds A l ∈ R N l ×N l−1 for l ∈ [L] and b l ∈ R N l , l ∈ [L], where L is the number of layers in the neural network. N 0 , . . . , N L are the numbers of neurons per layer (cf. Bölcskei et al. [3], Definition 1).

Problem definition
Now we give a formal definition of the verification problem for ReLU neural networks and comment on some relevant properties of this problem. In the following we use the term (solving) model to refer to an algorithmic approach for solving the verification problem. This may encompass a range of choices in obtaining an actual algorithm.

Definition 1 (Verification Problem for ReLU Neural Networks)
Assume that ∅ = X ⊂ R n is a polytope, and let ∅ = Y ⊂ R m be such that . Given a neural network F : X → R m with ReLU activation function, the verification problem consists in the decision whether F(X ) ⊆ Y holds. A triple (X , Y , F) will be called an instance of the verification problem (for ReLU neural networks). Furthermore, if F(X ) ⊆ Y , we say that the instance is verifiable, otherwise it is refutable.
The construction of the feasible input polytope X and the set of admissible outputs Y is solely based on the application for which the neural network shall be used. Depending on the algorithm which is used to solve the problem, the halfspaces Q i can either be open or closed. Though, either all of them must be closed or all of them must be open. However, the use of floating point arithmetic by a solver for the verification problem makes this distinction rather unimportant, since numerical comparisons require the use of a certain threshold difference.
Moreover, Katz et al. [17] show that the verification problem for ReLU neural networks is NP-complete. Hence we cannot expect that the problem can be solved efficiently in general. We also follow the naming concept of Katz et al. [17] and refer to verifiable instances of the verification problem as UNSAT instances, and to refutable instances as SAT instances. This naming corresponds to the existence of a counterexample as defined in the following remark.

Remark 1
If an instance (X , Y , F) is refutable, i.e. F(X ) Y , we want to provide x ∈ X such that F(x) / ∈ Y . We will refer to this x ∈ X as a counterexample for the instance.
Remark 2 More complex properties can be investigated by spliting them into separate instances. For example, if Y = k i=1 Q i ∩ l j=1 P j for halfspaces Q i and P j and k, l ∈ N, then F(X ) ⊆ Y holds if and only if F(X ) ⊆ ( k i Q i ) and F(X ) ⊆ ( l i P i ).

Remark 3
Considering an instance Π = (X , Y , F) of the verification problem with X ⊂ R n , we will often assume the existence of bounds l i , u i for i ∈ [n] such that l i ≤ x i ≤ u i for x ∈ X . Indeed, the requirements of Definition 1 justify this assumption. These bounds can be computed by solving one LP per bound. We set In fact, for all publicly available instances of the verification problem that we are aware of, the polytope X is actually a box which is directly given by the bounds l i , u i for i ∈ [n]. For these instances it is thus not necessary to solve any LP in order to obtain the bounds. However, in this paper we also consider instances where X is not a box, cf. Sect. 8 and "Appendix A".

Remark 4
Assume that we are given an instance Π = (X , Y , F) of the verification problem as introduced in Definition 1. Some solving models of other authors are not only limited to instances where the input polytope X is in fact a box. Also the choice of output constraints as represented by Y is more restricted for these models. These require that Y = k i=1 Q i ⊆ R m , where k ∈ N and Q i ⊆ R m is an open halfspace for i ∈ [k]. Indeed, this is the only of the cases which are regarded in Definition 1 where R m \Y is a polyhedron. Yet, it is possible to use such restricted solving models to solve an instance Π = (X , Y , F) where Y = k i=1 Q i ⊆ R m for open halfpaces Q i ⊆ R m . To this end, it is necessary to split the corresponding instance into k instances (X , Q i , F). Clearly, if F(X ) ⊆ Q i for all i ∈ [k], then it holds F(X ) ⊆ Y and Π is verifiable. On the other hand, if there is x ∈ X and some i ∈ [k] such that F(x) / ∈ Q i , we know that Π is refutable since F(X ) Y . We will refer to such an instance Π as conjunction instance. On the other hand, an instance for open halfspaces Q i ⊆ R m , will be called disjunction instance. We will also regard those instances as disjunction instances that fulfill Y = Q for some open halfspace Q ⊆ R m . In fact, all instances that we consider in our computational experiments (see Sect. 8) are based on open halfspaces. Closed halfspaces are only mentioned in some cases to provide a comprehensive explanation.
Often, we will regard constraints of the form y = ReLU(x) := max({x, 0}) for x ∈ [l, u], y ∈ R, that refer to a certain neuron with ReLU activation function. If the bounds l, u ∈ R with l ≤ u are such that either l ≥ 0 or u ≤ 0, we say that the corresponding neuron is fixed (in its phase).
A box, a polytope or a union of polytopes is defined as the feasible input domain X for the property which shall be verified. Then, linear properties are defined that we denote in terms of a set Y , such that Π is verifiable if and only if F(X ) ⊆ Y . Complete algorithms (except in [9]) are employed to solve this problem, i.e. if there existsx ∈ X such that F(x) / ∈ Y , this will be reported. Clearly, the verification problem is not necessarily limited to neural networks with ReLU activations, i.e. other activation functions are sometimes considered, too. Cheng et al. [7] and Narodytska et al. [20] regard the verification problem on binarized neural networks, which we do not investigate further.
First approaches to verification of neural networks [21,22,26] belong to the field of satisfiability modulo theories (SMT), which generalize the boolean satisfiability problem by replacing variables with predicates from various theories. Also the solver Reluplex [17] for verification of neural networks is presented in this context, but solves instances which are significantly more difficult, using an extended version of the well known simplex algorithm. Ehlers [10] presents the solver Planet, which is based on LP and SAT solving. Dvijotham et al. [9] formulate the verification problem as a non-convex optimization problem and obtain an incomplete algorithm. Xiang et al. [41] regard the propagation of an input polytope through a ReLU neural network, and Xiang et al. [40] propose to discretize the input domain in order to verify neural networks. However, their work remains limited to theoretical considerations and the presentation of numerical toy examples.
Various authors [6,11,19,33] consider MIP models for the verification problem. The performance of such MIP models is predominantly determined by the quality of the bounds which are computed for the ReLU neurons in the neural network. For that reason, the computation of such bounds is discussed in more detail in Sect. 5. The use of appropriate branching schemes is also important for an MIP model of the verification problem, we will provide more details on this in Sect. 7. In fact, it is not necessary to solve the verification problem as an MIP if such approximation and branching methods are used. Bunel et al. [5] present such a branchand-bound method without solving the verification problem as an MIP directly. Moreover, they provide a good comparison of various methods for neural network verification. Besides their own approach, the empirical evaluation includes Reluplex [17], Planet [10], and an MIP model based on the suggestions of various authors [6,19,33]. While we also implement an MIP model to solve the verification problem, its functioning is more similar to the branchand-bound method of Bunel et al. [5] than to the MIP model they use in their comparison. Besides, we consider various additional aspects, and therefore speed up the solving process significantly. For a computational comparison of other solvers with ours, we select Reluplex [17] and the branch-and-bound method [5]. The other solvers regarded by Bunel et al. [5] are not competitive with these, as their experimental results show. Moreover, we regard the solvers ReluVal and Neurify as introduced by Wang et al. [35,36]. The concept for both solvers is also a branch-and-bound scheme, that works with a frequent linear approximation of the regarded neural network. In contrast to the method of Bunel et al. [5], the approximation is not as good, but much faster to compute.
Anderson et al. [2] present an ideal MIP formulation for ReLU constraints which is closely related with the techniques used in our work. Especially, they present a separation routine which can be used to compute stronger neuron bounds. Optionally, we include this separator in our solving model as mentioned in Sect. 8.2. Nevertheless, it should be noted that the results of Anderson et al. [2] do refer only to single ReLU neurons and at most the layer before. Hence we do not have an ideal formulation of the whole network which implies that solving the verification problem cannot be reduced to solving an LP using their formulations.
The idea of output range or reachability analysis is in principle to compute the output range F(X ) of a neural network F, given an input domain X . Since this is quite difficult, the relevant work of Dutta et al. [8] and Ruan et al. [25] is limited to computing the range g(F(X )), for some function g : F(X ) → R. The function g should then give some insights into the output of the neural network F on input domain X . Clearly, this problem is closely related to the verification problem.
Several authors [12,23,[29][30][31]34,[37][38][39]42,43] consider the problem of computing robustness guarantees for neural networks which are used for classification. Robustness means, that the classification of an input sample should remain the same when the input is changed by small perturbations. The computation of certified robustness bounds should rule out the existence of adversarial examples. Indeed, this problem is a special case of neural network verification. Except for Tjeng et al. [34], this problem is solved by incomplete algorithms. That means, an algorithm either returns a guarantee that a region around an input sample is free of adversarial examples, or no result, which is due to the use of approximations.
Modelling ReLU neural networks as MIPs is considered in the literature for other application domains, too. Grimstad and Andersson [15] investigate the usage of ReLU neural networks as surrogate models in MIPs and study various bound tightening techniques. Serra et al. [28] apply a MIP formulation of a ReLU neural network to enable a lossless pruning method. This way, equivalent neural networks of smaller size can be obtained. The computation of linear regions in ReLU neural networks is another field of application [27].

Neural network verification as MIP
It is straightforward to formulate the verification problem as a mixed integer program (MIP), see [6,8]. We present a slightly improved formulation, as it can be found in [5,34]. In this formulation, each neuron is represented by one or two (continuous) variables. The value of a neuron before application of the ReLU function is given as a linear combination of the output values of the predecessor neurons in the network plus the bias. That means, this connection can be simply modelled by a linear equation in the MIP. We need two variables for neurons with ReLU activation function. Let variable x be the input value to the ReLU function and y be the output value. In this setting we will refer to x as the ReLU input variable and to y as ReLU output variable. We want to model y = max{0, x}, which is represented using one additional binary variable d. Furthermore, we need that upper and lower bounds l ≤ x ≤ u are known. Then we obtain the following constraints which are equivalent to y = max{0, x}: Of course it is possible that we have l ≥ 0 or u ≤ 0 for the bounds. In these cases, we can omit the binary variable d and replace (1) as follows. If l ≥ 0, this implies y = max{0, x} = x, i.e. (1) is replaced by y = x for x ∈ [l, u]. If u ≤ 0, we have y = max{0, x} = 0 and thus we can set y = 0 for x ∈ [l, u]. These cases correspond to fixing the binary variable d to 1 or 0, respectively. Let Π = (X , Y , F) be a disjunction instance of the verification problem such that it holds . Then it is straightforward to formulate an MIP which is feasible if and only if Π is refutable. The instance Π of the verification problem is represented by the following constraints: x ∈ X , y ∈ R m \Y and y = F(x) (2) This is an MIP, since x ∈ X and y ∈ R m \ Y can be represented by linear constraints. Especially, y = F(x) can be expressed by linear constraints combined with integrality constraints for auxiliary binary variables that are used to model the ReLU function as shown in (1) , we consider two options. Either we split instance Π into k instances as mentioned in Remark 4 or we formulate the verification problem as an optimization problem as proposed by Bunel et al. [5]. In this setting, an instance Π = (X , Y , F) is verifiable if the optimum value of the corresponding optimization problem is greater than zero and refutable if it is lower than zero. Assume The same holds for closed halfspaces Q i , i ∈ [k], if all inequalities ">" are replaced by their counterparts "≥". Analogously, with open halfspaces Q i as before and Y = k i=1 Q i the same can be shown with "min" instead of "max". For the case Y = k i=1 Q i we consider the following MIP: Indeed, (3) is an MIP since the constraint t = max{z 1 , . . . , z k } can be replaced by linear constraints using k additional binary variables as shown in [5,34]. In this case, we can also replace the constraint by t ≥ z 1 , . . . , t ≥ z k .
, is verifiable if and only if the optimum value of (3) is greater than zero.
It follows t ≥ z j := q T j y −b j > 0 which implies the desired result since x ∈ X was arbitrary. Remind that we regard optimum solutions of an MIP so it suffices to consider finitely many x ∈ X .
For the opposite direction, assume that the optimum valuet of (3) fulfillst > 0. Let x ∈ X be arbitrary and y = F(x).
it holds max{z 1 , . . . , z k } ≥t > 0 sincet is optimal. In other words, there is j ∈ [k] such that q T j y − b j = z j > 0 and thus y ∈ Q j ⊆ Y . Since x ∈ X was arbitrary, Π is verifiable.
It works also for the case Y = k i=1 Q i by replacing "max" with "min" in (3), and similarly for closed halfspaces. In practice, the optimum valuet of (3) will usually be significantly greater than zero if an instance is indeed verifiable. Clearly, it is not necessary to actually computet in order to solve the verification problem as Bunel et al. [5] point out. If the dual bound of (3) is greater than zero, the instance is verifiable. We mainly use this formulation in our implementation. On the other hand, if the primal bound of (3) is lower than zero, we know that the corresponding instance of the verification problem is refutable ast < 0 is already implied. However, this case has less relevance since primal solutions are usually only found by specialized heuristics which we describe in Sect. 6.
Besides, we note that the verification problem can be modelled as quadratic program. This formulation does not require any integer or binary variables as the nonlinear behavior of the ReLU activations is modelled by the quadratic objective function and an optimality condition.
and k ∈ N. Let ((A l , b l )) L l=1 be the weights and biases corresponding to F. Here, L is the number of layers in the neural network and N 0 , . . . , N L are the numbers of neurons per layer. This implies X ⊆ R N 0 and Y ⊆ R N L and we can state the formulation:

Theorem 2 Instance Π is refutable if and only if the quadratic program (4) is feasible and the optimum value is zero. Otherwise Π is verifiable.
Proof We first assume that Π is refutable so that we can find x ∈ X with F(x) / ∈ Y . We set On the other hand, if (4) is feasible and the optimum value is zero, we know that there is To evaluate this formulation computationally, we tried a plain implementation in SCIP [13]. Within a time limit of one hour, SCIP is not able to solve any of the disjunction instances in our SAT and UNSAT evaluation sets. Only very easy MNIST instances could be solved with this formulation.
Anderson et al. [2] present an ideal MIP formulation for ReLU constraints which can replace (1). It should be noted that the formulation is ideal for a single ReLU neuron but not for the whole neural network. As the formulation of Anderson et al. [2] has an exponential number of constraints, they also describe a separation routine that runs in linear time. This allows to strengthen formulation (1) by adding additional cuts to the LP relaxation, as obtained by the separation routine.

Approximations of ReLU neural networks
Solving the problem of neural network verification requires to model constraints of the form y = max{0, x} for all ReLU input variables x and corresponding ReLU output variables y of each layer. It is crucial to obtain tight bounds l, u on the value of x before the application of the ReLU function. Especially, we regard the linear approximation of these constraints for a whole layer at once, an idea so far considered only briefly in [2,5,23].
Given an instance Π = (X , Y , F) of the verification problem with X ⊂ R n , we will assume the availability of input bounds l i , u i with i ∈ [n] for the components of X throughout this section (cf. Remark 3). All approximation methods that we present are executed layer by layer. Based on the input bounds, we compute bounds for the neurons in the following layer. This process is iterated until the last layer is reached, i.e. the output layer. Depending on the instance and the bound computation approach, it may be possible to prove that Π is verifiable using only these bounds for the output layer. Assume that we have a set A which approximates the neural network output F(X ), i.e. F(X ) ⊆ A. In case that A ⊆ Y , we have thus shown that F(X ) ⊆ Y , which means that Π is verifiable.

Basic approximation methods for bound computations in neural networks
The most simple approximation approach is naive interval arithmetic as used in [10,35]. Figure 1 provides a visual representation of this approximation, to which we will refer as x y naive approximation. This simple approach mainly suffers from the fact that it assumes the independency of all predecessor neurons when computing a new bound. Therefore, the bounds computed with this method are so bad, that they only serve to solve tiny instances. Wang et al. [35] use symbolic interval arithmetic to keep track on some of the neuron dependencies in order to compute better bounds. The idea is to keep a symbolic equation, based on the input values of the network, for each neuron. This symbolic approach can only provide better bounds if at least some of the ReLU activations can be fixed positively, i.e. l ≥ 0. Otherwise, the symbolic interval arithmetic uses the same bounds as the naive method and computes new bounds in the same way.
To overcome this drawback, Wang et al. [36] improve the method by introducing a different approximation for the case l < 0 < u. The main idea is to maintain the symbolic dependencies also in this case. Though, the linear equation for the value of a ReLU neuron with input bounds l < 0 < u cannot be kept. Instead a symbolic equations is introduced which provides a lower and upper bound for the neuron value. These symbolic bounds can then be propagated through the network and have the advantage that the dependency information partially remains. For the propagation of the symbolic equactions an approximation is used as visualized in Fig. 2. Now we consider a linear approximation of ReLU constraints which was first proposed by Ehlers [10]. In fact, we will show that it is best possible in a certain sense, which we define in the following subsection. Given We graphically depict this approximation in Fig. 3 which in fact coincides with the linear relaxation of the MIP formulation (1) for ReLU constraints, see [2,5].
Of course, the linear approximation of Ehlers [10] remains valid, if either the constraint y ≥ 0 or the constraint y ≥ x is removed. This enables the use of matrix multiplication (cf. Zhang et al. [42]) or static analyzers with abstract domains (cf. Singh et al. [30]) for the propagation of the inequations.
Another approximation method is proposed by Raghunathan et al. [23] in the context of robustness certification. It consists in an SDP relaxation for ReLU neural networks that acts simultaneously on all neurons of a layer.

Comparison of linear ReLU approximations
In general, one ReLU layer contains several neurons, and we are interested to compute an approximation of the output range of the layer. This approximated output range can then be regarded as input to the next layer. As we want to reach a quick propagation of the output ranges through the layers, it is important that the approximated output range is a polytope. This allows to compute neuron bounds quickly using linear programming. In the following, we develop a theoretical framework to analyse different linear approximations.
Definition 2 (ReLU approximation) Let n, m ∈ N, A, B ∈ R m×n , c ∈ R n , and P ⊂ R n be a polytope. We say that The consideration of ReLU proper polytopes simplifies the formulation of statements, as fixed ReLU neurons are not regarded. If we apply the naive approximation to a ReLU proper polytope we obtain a box [0, where u i is the upper bound for the corresponding variable. We see that this is an independent ReLU approximation. Let A = 0 ∈ R 2n×n and for each i ∈ [n], we add two rows to matrix B and vector c to enforce 0 ≤ y i ≤ u i for i ∈ [n], i.e. m = 2n for the m in Definition 2. These rows are e T i y ≤ u i and −e T i y ≤ 0, where e i is the i-th unit vector in R n . Hence, we have exactly one non-zero coefficient in each row of B and only zero coefficients in A, so that the property holds. In passing we notice that the approximation proposed in Wang et al. [36] is an independent ReLU approximation, too. Now we use our definition of a ReLU approximation for a more thorough investigation of the possibilities to approximate ReLU constraints. Within the restrictions of the definition, we would like to find matrices A, B and c for a ReLU proper polytope P, such that Q is as small as possible (with respect to inclusion). First, we will restrict our analysis to independent ReLU approximations and claim: the approximation proposed by Ehlers [10] is best possible among all independent ReLU approximations of a ReLU proper polytope. We define this approximation formally as a ReLU approximation in order to state the result in Theorem 3.

Definition 3
Let P ⊂ R n be a ReLU proper polytope. The ReLU approximation of P corresponding to the approximation of Ehlers [10] will be denoted as Q E . In detail, for i ∈ [n], we set For that, we use l i := min x∈P x i and u i := max x∈P x i and eventually define (1) . . . (1) . . . (1) . . .
Thus we obtain Remark 5 Indeed, Q E is an independent ReLU approximation. All rows of A and B are either 0 or a multiple of a transposed unit vector e T i ∈ R n . If the latter is the case, i ∈ [n] is the same both in A and B when regarding the same row indices of A and B.
Theorem 3 Let P ⊂ R n be a ReLU proper polytope and Q E be the approximation of P as in Definition 3. For any independent ReLU approximation Q of P it holds Q E ⊆ Q.
For the proof see "Appendix C". In the following section, we explain how the approximation of Ehlers [10] is used in [5,10] and discuss possibilities to speed up the computation to make this method more efficient. Then we present an approximation that is stronger than the one of Ehlers [10] and hence not independent in Sect. 5.4.

Efficient optimization based bound tightening for neural network verification
If we build the MIP model using some preliminary lower and upper bounds for each ReLU neuron, we can use the LP relaxation of the model to approximate the output values of the neural network. As in [5,10], we can also tighten the neuron bounds using the LP relaxation, which is identical to the approximation of Ehlers [10]. For each ReLU input variable x we compute an optimal solution of the LP relaxation for the objective functions x and −x. The optimum objective values hence give the new bounds for x in the neural network. In accordance with Gleixner et al. [14], we call this technique optimization based bound tightening (OBBT).
After the bound update, it is crucial to improve the MIP formulation (1). This allows to compute significantly tighter bounds in the next layer. Indeed, it is possible to build the approximation of the whole network only during the process of bound optimization. That means, each variable (corresponding to a neuron) is added separately to the relaxed MIP model such that the LP is always as small as possible. We regard the ideas of Gleixner et al. [14], who implemented an OBBT propagator in SCIP [13], to reduce the computational cost. Gleixner et al. [14] treat two main topics: First, they show how to generate and propagate Langrangian variable bounds (LVBs), and second, they propose methods for the acceleration of OBBT.
LVBs are valid inequalities with respect to the LP relaxations of mixed integer nonlinear problems (MINLP), which also includes LP relaxations of MIPs. Gleixner et al. [14] state that LVBs can be viewed as a one-row relaxation of the given MINLP, that provide a local approximation of the effect of OBBT. They are obtained as a by-product of the LP solutions which are computed during the execution of OBBT. Dual multipliers of the LP relaxation and an objective cutoff constraints are used to create an LVB. For the actual definition and more details see Gleixner et al. [14]. If we model the neural network verification problem as optimization problem as described in Sect. 4, we are only interested whether there exists a solution with objective value smaller than or equal to zero or not. Hence, we can safely cut off all solutions with an objective value greater than some ε > 0. For our experiments we set ε := 0.1 to have a sufficiently big margin to zero in order to prevent erroneous results.
The advantage of LVBs is that they can be propagated efficiently through a branch-andbound tree, while the frequent application of OBBT requires a great computational effort for each branch-and-bound node that is processed. We see in our experiments that for some instances the use of LVBs is able to speed up the solving process significantly. See Sect. 8.2 for an overview of the experiments.
Moreover, we consider ideas of Gleixner et al. [14] for accelerating the application of OBBT. One aspect is filtering of bounds which can hardly be improved by executing OBBT. Assume that y is the value of a variable, which is a candidate for the application of OBBT, in a feasible solution of the LP relaxation. Moreover, let l ≤ y ≤ u be the bounds which are currently known for this variable. If then y − l ≤ ε or u − y ≤ ε for some ε > 0, OBBT can strengthen the corresponding bound by at most ε, as Gleixner et al. [14] point out. Yet, initial experiments showed, that usually almost all bounds can be improved significantly by OBBT, so that filtering bounds is not useful for verification of neural networks. Another aspect is the order of the variables for which OBBT is executed. As OBBT is executed layer by layer in our case, the order of variables can only be changed within each layer. However, the various strategies of Gleixner et al. [14] did not show any advantage over a simple fixed order in our computational experiments, see Rössig [24] for details.
Eventually, we consider another approach for bound computations in neural networks that is also a form of OBBT. Instead of using the LP relaxation to compute bounds, it is also possible to employ the exact MIP model and compute bounds for the neurons with OBBT. Computing the neuron bounds using the MIP formulation instead of the LP relaxation leads to strongly improved bounds. Although not all MIPs are solved to optimality, clear improvements of the corresponding bounds can be reached within a time limit of few seconds per MIP. These improvements render it possible to solve also relevant instances without specialized branching rules for neural network verification, however the bound computations take a lot of time.

Optimization based relaxation tightening for two variables
In general we regard neural network layers that feature ReLU activations for all neurons of the layer. Hence, we investigate in more detail how the ReLU function behaves in higher dimensions, i.e. if the ReLU function is applied componentwise to layers with several neurons. The following theorem can be found in Xiang et al. [41] as Corollary 1: Theorem 4 For a polytope P ⊂ R n , ReLU(P) is a finite union of polytopes.
Hence we see that the best possible convex approximation conv( ReLU(P)) of ReLU(P) is the convex hull of the union of polytopes in Theorem 4. We investigate a simple example to see how the approximation of Ehlers [10] differs from this best possible convex approximation. We consider a toy example as depicted in Fig. 8 of the "Appendix". Figure 4 shows the feasible input polytope of the ReLU layer and the corresponding ReLU image. The same ReLU image can be seen in Fig. 5, replenished with a depiction of its convex hull, the approximation of Ehlers [10] and the naive approximation. Figure 5 clearly shows that even for only two variables the convex hull of the ReLU image is strictly smaller compared to the approximation of Ehlers [10]. It seems appealing to find an improved approximation of the ReLU image closer to the convex hull, which is the best possible convex approximation.
Subsequently, we propose an efficient method which can strengthen the approximation of Ehlers [10] by considering at least pairs of two neurons jointly. This new ReLU approximation is not independent (cf. Definition 2). The depiction in Fig. 5 shows, that in this situation we could actually add one inequality and would improve the approximation to be exactly the convex hull of the ReLU image. This inequality is induced by the connecting segment between the vertices of the convex hull that maximize y 1 or y 2 , respectively. Of course, we indicates the convex hull of this ReLU image. The approximation of Ehlers [10] is bounded by the orange segments, the naive approximation by the green ones (and coordinate axes) 6 Feasible set before (blue) and after ReLU application (red) for a different input polytope cannot make this inequality tighter, since otherwise feasible points of the ReLU image would be cut off. Though, the segment between the vertices that maximize y 1 or y 2 , respectively, does not always induce a valid inequality as we show in the following example. Figure 6 shows a polytope of feasible x 1 , x 2 values and the corresponding ReLU image of feasible values for y 1 and y 2 , such that y 1 = max{0, x 1 } and y 2 = max{0, x 2 }. The polytope is two dimensional, but can also be considered as embedded image of a higher dimensional polytope which is projected onto its variables x 1 and x 2 . These two variables correspond to two neurons in one layer of a ReLU neural network. The dimension of the original polytope is then the number of all neurons in that layer. It should be noted that we use these projections to R 2 only for the visualization of our method. The goal of our method is to obtain a tighter approximation without computing projections of higher dimensional polytopes. In Fig. 6, the segment (dashed line) between the vertices that maximize y 1 or y 2 , respectively, does not induce a valid inequality with respect to the ReLU image. Now the idea is to add an inequality to the model which partly cuts off the polytope resulting from the approximation of Ehlers [10], but leaves the ReLU image intact. The cut is parallel to the segment between the vertices that maximize y 1 or y 2 , respectively. Depending on the situation, these vertices will either meet the inequality with equality or not. Figure 7 depicts this inequality and shows that adding this constraint considerably improves the approximation of the convex hull. In the following we describe how this constraint can be computed. A linear Fig. 7 Here we see the ReLU image of the polytope depicted in Fig. 6 colored in red, its convex hull in black, the approximation of Ehlers [10] in orange, the inequality which we want to introduce as a black dashed line and the constraints of the naive approximation as a green dashed line. All sets are limited by the coordinate axes approximation of the ReLU neural network in question serves as a basis. Naturally, we can use the LP relaxation (which corresponds to the approximation of Ehlers [10]) if the verification problem is formulated as an MIP.
Assume we want to tighten the approximation for the ReLU output variables y 1 and y 2 which correspond to ReLU input variables x 1 and x 2 . All of these variables are contained in the LP relaxation of the neural network. In the final solution it must hold y 1 = max{0, x 1 } and y 2 = max{0, x 2 } due to the ReLU constraints. Letâ andb be the optimum solutions when maximizing x 1 or x 2 , respectively, in the current LP relaxation. Then we writeâ 1 and a 2 for the values of the variables x 1 and x 2 in the solutionâ. Analogously we writeb 1 andb 2 for the corresponding variable values in solutionb. It should be noted that these LP solutions are computed during the execution of OBBT, and can therefore be obtained at no additional cost. Obviously it holdsâ 1 ≥b 1 andb 2 ≥â 2 due to the choice of objective functions. Now we define a 1 := max{0,â 1 } and analogously a 2 , b 1 and b 2 . We compute new objective coefficients as c 1 := b 2 − a 2 and c 2 := a 1 − b 1 , i.e. c 1 , c 2 ≥ 0. The latter holds due to the fact that α ≥ β implies max{0, α} ≥ max{0, β} for α, β ∈ R. Again, we solve an LP using the current relaxation and maximize the objective function c 1 x 1 + c 2 x 2 . We denote the optimum objective value as γ and compute δ := c 1 a 1 + c 2 a 2 . After this computation we can strengthen the LP relaxation by adding the constraint (5) is a valid inequality with respect to the ReLU image corresponding to y 1 and y 2 . That means, constraint (5) can strengthen the LP relaxation of our MIP for the verification problem but cannot cut off any feasible solution.
Thus, the approximation of Ehlers [10] can be improved by adding constraints of type (5) to the LP relaxation of the model. Like this, we obtain a ReLU approximation which is not independent. Although we have to solve only one LP per pair of neurons, applying this method to all possible pairs of neurons would lead to an immense computational cost. Therefore, we select only some pairs of neurons for which it is likely to significantly strengthen the LP relaxation by adding the new inequality to our model. Though, our selection strategy as laid out in Rössig [24] was not able to outperform a baseline selection strategy, which selects neurons in a fixed, predetermined order. Yet, this technique, which we abbreviate as OBBT2, can significantly strenghten the LP relaxation and reduce the number of nodes in the branch-and-bound tree (see Table 10 in the "Appendix").

Primal heuristics
For the problem of neural network verification the use of primal heuristics lies in the quick falsification of incorrect properties. Surprisingly, even a trivial heuristic, which only performs random sampling within the set of feasible inputs, can often find counterexamples to incorrect properties quickly in contrast to standard MIP heuristics.
The idea of the random sampling heuristic as introduced by Bunel et al. [5]) is plain and simple: Given an instance Π = (X , Y , F) of the verification problem, we randomly pick x ∈ X and check whether F(x) ∈ Y . In case that F(x) / ∈ Y , we know that Π is refutable. Moreover, using the MIP formulation as optimization problem, the input vector x is also useful if it leads to a decrease of the primal bound, since this may help to tighten neuron bounds. In general it is not trivial to obtain x ∈ X , if X ⊂ R n is an arbitrary polytope. However, as mentioned in Remark 3, many of the instances we regard feature a polytope X which is actually a box. In this case, we simply pick x i ∈ [l i , u i ] uniformly at random for i ∈ [n], where l i , u i are the bounds of X for each component. This is performed similarly in [5,36]. Otherwise, if X is not a box, we solve an LP to obtain x ∈ X using a random objective function.
We propose another heuristic that can be used in addition to the random sampling heuristic. It is based on the local search proposed by Dutta et al. [8] for output range analysis of ReLU neural networks. Though, we omit the use of gradient information and fit the heuristic more naturally into the framework of MIP solving. The main idea is to fix all neurons in one of their phases, such that the optimization variant of neural network verification consists only in solving a linear program. We start with a feasible input x 0 ∈ X for the neural network and use forward propagation to compute the values of all neurons in the network. Then, for each ReLU neuron, we fix the binary variable d in (1) to zero or one, corresponding to the phase of the neuron that is determined by propagating x 0 through the network. Furthermore, the binary variables in the formulation of the maximum function for objective variable t are also fixed, such that t = max{z 1 , . . . , z k }. With all binary variables fixed, the MIP as described in Sect. 4 becomes an LP.
This LP is minimized with respect to variable t as objective function. After the first minimization LP has been solved, we choose a ReLU input variablex (corresponding to one ReLU neuron) of value zero if possible. For this variable, we switch the fixed value of the corresponding binary variabled from zero to one or vice versa. Then we optimize again and obtain a new input vectorx 0 ∈ X for the neural network. After that, we switch the fixing of another binary variable, whose corresponding ReLU input variable has value 0 in the solution. This process is iterated until we find a feasible counterexample, i.e. the optimal value of the LP is smaller than zero, or we reach a predefined iteration limit. In case that none of the ReLU input variables is equal to zero, we have to abort the procedure. It is easy to see that switching the fixings of the binary variables as described, can only reduce the objective value of the optimum LP solution.
In the following we describe, how we combine our LP based heuristic with the random sampling heuristic. First we use the random sampling heuristic to find an input vector x 0 ∈ X . The random sampling process and forward propagation are very fast, and therefore we try many (e.g. 1000) random inputs to find an input x 0 ∈ X . Out of all sampled input vectors, we select x 0 ∈ X such that it corresponds to the lowest value of objective variable t. The hope is that x 0 can be converted into an actual counterexample by computing a new input vector x 0 . This is given by the optimum LP solution after some ReLU phase switches as described. Instance Π is shown to be indeed refutable, if the value of t is below zero in this optimum LP solution.
Of course, both heuristics can be applied several times throughout the solving process in a branch-and-bound tree which we enable in our implementation. Our experimental evaluation shows that the LP based heuristic works quite successfully. In fact, the mean runtime on our evaluation set of SAT instances, as mentioned in Sect. 8, drops from 330.1 to 71.7 seconds if our LP based heuristic is employed. On the other hand, the mean runtime on our evaluation set of UNSAT instances increases only slightly from 915.3 to 943.7 seconds due to the application of our LP based heuristic.

Branching for neural network verification
The verification problem can be solved with a generic branch-and-bound approach as described by Bunel et al. [5]. If the problem is solved as an MIP, specific branching rules for neural network verification can be integrated into the MIP solving process to strongly speed up the process. Initial bounds are necessary for the formulation of the verification problem as an MIP model and can be obtained by one of the approximation methods as introduced in Sect. 5. Many relevant instances of the verification problem cannot be solved if an approximation of the network is computed only once. Specific branching rules can be used to split an instance into simpler ones which can be approximated better. One option is to split the set of feasible input vectors for an instance of the verification problem as in [5,35].
Given an instance Π = (X , Y , F) of the verification problem, the design of the domain branching rule is based on the assumption that X is a box. However, the branching rule can also be applied if X is not a box. We assume the existence of bounds l i ≤ x i ≤ u i for all x ∈ X ⊂ R n and i ∈ [n], cf. Remark 3. In case that X is a box, it holds X = [l 1 , u 1 ]×. . .×[l n , u n ], otherwise X ⊆ [l 1 , u 1 ] × . . . × [l n , u n ]. Bunel et al. [5] propose to select j ∈ [n] and split the domain of variable x j to subdomains [l j , The domains of all other variables x i , i ∈ [n]\{ j} are left unchanged, so that we obtain two sub-instances with smaller input domains.
The selection of the branching variable is very important for the performance of the branching rule, cf. [5]. In Bunel et al. [4], the selection depends on the depth in the branch-andbound tree and follows a fixed order. Bunel et al. [5] implement another selection rule, based on the approximation method of Wong and Kolter [38]. In our implementation we mainly use a selection rule "gradient" which is quite similar to the one used in Wang et al. [35]. For that, we extend the neural network F to another oneF, which encodes also the properties that shall be verified. It has output dimension one, and for a fixed input x ∈ X , the output is the same as the value of the objective variable t in the MIP formulation as optimization problem (3). We use a max-pooling layer to model the computation of the maximum in (3) in the neural networkF and refer to Bunel et al. [5] for more details on the construction. We compute the gradient ofF at the input vectors x 1 = (l 1 , . . . , l n ), x 2 = ( u 1 +l 1 2 , . . . , u n +l n 2 ), and x 3 = (u 1 , . . . , u n ) and let g := ∇F(x 1 ) + ∇F(x 2 ) + ∇F(x 3 ) ∈ R n . For i ∈ [n] we compute z i := |g i | · (u i − l i ) and choose the branching variable j ∈ [n] such that z j = max{z 1 , . . . , z n }. The intuition is that verifiability of the instance depends mainly on the values of input neurons with a high (averaged) absolute gradient value and sufficiently big input range.
Another natural possibility is to perform branching over the two phases of a ReLU neuron which is used in [6,10,17,36]. If the verification problem is formulated as an MIP, this corresponds exactly to branching over the binary variables which correspond to the ReLU neurons. Given a ReLU input variable x and the corresponding output variable y, we can branch the constraint y = max{0, x} into two cases: either x ≤ 0, y = 0, or x > 0, y = x. The main question is how to select the branching variables. As in Cheng et al. [6], we prioritize ReLU neurons which are located in the front of the neural network in the selection rule "standard". Similarly to Wang et al. [36], we implement a selection rule "gradient" that picks ReLU neurons which have a large gradient with respect to the outputs of the network. Though, in our experiments the rule "standard" performed better.

Computational evaluation
In this section we discuss computational results for various experiments that we conducted on a diverse set of test instances which we shortly present in the following. We empirically compare several methods for the computation of neuron bounds. Furthermore, we report computational results for various configurations of our solver and compare it with the programs of Bunel et al. [5], Wang et al. [36], and Katz et al. [17].
As we are not aware of any publicly available instances Π = (X , Y , F) of the verification problem where X is not a box (cf. Remark 4), we define such instances to show the capabilities of our solving model. As a basis we use the neural networks of the ACAS Xu system, which were used by Katz et al. [17] to create instances of the verification problem. These are described in "Appendix A". The ACAS Xu system is designed to prevent collisions of (autonomous) aircrafts. We also perform our evaluations on the original instances published by Katz et al. [17]. Furthermore, we use test instances with neural networks that are trained on the well known MNIST [18] handwritten digit data set. In fact, we use two of the trained neural networks published by Wang et al. [36] and verify robustness of classifications using the L ∞ norm. Each input neuron may have a value between 0 and 255 and we investigate four different perturbation radii (1, 5, 10, and 20). These networks have two layers, each of them with 24 or 512 neurons, respectively. The smaller one reaches an accuracy of 96.59 %, the one with 512 neurons per layer 98.27 %. In contrast to the neural networks of the ACAS Xu system with input dimension five, the input dimension of the MNIST networks is 784. This difference is especially interesting with respect to the performance of input domain branching as presented in Sect. 7. To evaluate various settings of our solving model, we use two evaluation subsets that contain a diverse selection of all the instances. One contains 13 SAT instances and the other one 23 UNSAT instances. In general, we compute average values for runtime and number of solving nodes as shifted geometric mean which reduces the sensitiviy to outliers, cf. [1,16].

Empirical comparison of bound computation approaches
We use our evaluation set of UNSAT instances for a numeric comparison of the bounds which are obtained by various bound computation approaches. For each instance Π = (X , Y , F) we compute lower and upper bounds [l i , u i ] for all neurons i ∈ [N ] (before application of the ReLU function), based on the feasible input domain X . The bounds are computed layerwise from the first to the last layer and branching is not applied. Then we compute the shifted geometric mean of {u 1 − l 1 , . . . , u N − l N } with shift value 1 for each instance. The mean value indicates how good the corresponding bound computation approach is. Clearly, it is desirable that the difference u i − l i is as small as possible for all neurons i ∈ [N ], and we use the mean value to compare the different approximation methods. See Table 8 in the "Appendix" for detailed results, here we report the overall mean.
As a second measure we report how many neurons can be fixed in their phase by the respective bound computation method. Detailed results on this can be found in Table 9 in the "Appendix".
The results in Tables 1 and 2 show that the best bounds are computed by OBBT on the MIP model. While OBBT2 computes better bounds compared to OBBT on the LP relaxation, the improvements are unfortunately quite minor. We also see that the approaches of Wang et al. [35,36] are clearly superior to naive interval arithmetic. Yet, they are not competitive with OBBT if regarding only the quality of the computed bounds.

Comparison of different techniques in our model
In this section we provide an overview of the performance of our solving model in various configurations. The experiments in this section are run for the instances of our evaluation set of UNSAT instances. All results are obtained on cluster nodes with Intel Xeon CPUs E5-2670 which have a clock rate of 2.5 GHz. Each experiment is run exlusively on one cluster node and a memory limit of 32 GB is set. In general we set a time limit of 7200 seconds. If the time limit is hit during the solving process, we assume the time limit as runtime for the corresponding instance.
We report results both for the whole UNSAT test set, and separately for the ACAS and MNIST based instances. Comparing ACAS and MNIST based instances, the main differences are the number of input neurons (5 vs. 784) and the number of layers in the neural networks (6 vs. 2).
For the experiments in this section we use a baseline configuration "no_heur_base" of our solving model. In this configuration, domain branching (with selection rule "gradient") is used up to a depth of 20 in the branch-and-bound tree. It should be noted that this depth is usually not exceeded if domain branching is used. Furthermore, OBBT is applied to the LP relaxation at each solving node up to a depth of 20 in the branch-and-bound tree. The primal heuristic is enabled only at the root node. Further techniques (e.g. our separator based on the work of Anderson et al. [2]) are not applied.
In Table 3 we report mean runtimes on the UNSAT test set for several different configurations of our solving model. Here we try to give an impression of the differences between the configurations which we tested. We are mostly interested in the computation of the dual bound by various methods and therefore the primal heuristic is only employed at the root node. Table 3 shows that there is a clear difference between solving MNIST and ACAS instances. In fact, the configuration "no_heur_base_genv" is the fastest in total because it performs well on both types of instances. However, it is not the best configuration for either of the two subsets (although close to the best configuration for ACAS instances). The best configurations with respect to the MNIST instances ("no_heur_relu_genv" with respect to number of timeouts, "mnist_base" with respect to mean runtime) do not perform well on the ACAS instances. The configurations in Table 3 are sorted by the total number of timeouts. In the following, we give short descriptions of the configurations in this order. The configuration "no_heur_base_genv" corresponds to the baseline configuration with the additional use of Langrangian variable   [2]. Next in Table 3 follows our baseline configuration which is hence quite good already. Indeed, "no_heur_base_obbt2_nosort" is the best of our configurations that use OBBT2 and it has a higher mean runtime. Regarding the mean runtime compared to the number of timeouts, the configuration "no_heur_base_mip" is not in line with the other configurations. Indeed, this configuration applies OBBT to the MIP model in the beginning of the solving process to compute initial neurons bounds. This comes with a high computational cost, also for rather easy instances. However, for more difficult instances, this stratey is not that bad as indicated by the relatively low number of timeouts. In the configuration "no_heur_relu_genv", ReLU branching is combined with OBBT on the LP relaxation and the creation of LVBs. Most notably, this configuration is the only one that solves all MNIST instances in the evaluation set within the time limit. Though, the performance on the ACAS based instances is rather mediocre. Due to the low number of input neurons of the ACAS neural networks, input domain branching is more efficient for these instances. The same holds for the configuration "no_heur_relu" which does not include the LVBs. Eventually, in the configuration "mnist_base" the solving process is limited to the application of standard MIP techniques. Notably enough, this configuration has the lowest mean runtime on the subset of MNIST instances. On the other hand, it times out on all ACAS instances. This vast difference can be attributed to the different number of layers in the neural networks (two for MNIST, six for ACAS), as deeper networks are more difficult to approximate.

Comparison with other solvers
In this section we provide detailed results of computational experiments which we conducted to investigate the performance of different solvers for neural network verification. Besides our own solving model, which we regard in various configurations, we include the programs of Bunel et al. [5], Wang et al. [36], and Reluplex by Katz et al. [17]. The work of Wang et al. [36] is implemented in two solvers, Neurify and ReluVal. ReluVal is used on the ACAS instances whereas Neurify is applied to the MNIST instances. We always check whether any alleged counterexample presented by some solver is indeed a feasible counterexample for the corresponding instance. In general, we perform these checks with an absolute numerical tolerance of 10 −8 . Several counterexamples are only feasible though, if a higher numerical tolerance is allowed, which we report in that case. Each experiment is run exclusively on one cluster node with an Intel Xeon Gold 5122 CPU which operates at a clock rate of 3.6 GHz, and we set a memory limit of 32 GB. Besides the program names ReluVal, Neurify and Reluplex, we use the following terms to denote the various solving models. Adv refers to ReluVal or Neurify using their adversary check mode which is focused on finding counterexamples. BaB denotes the branch-and-bound method of Bunel et al. [5]. Eventually, we use NonOpt to describe that our solving model is used with the formulation of the verification problem as feasibility problem. Joint refers to our solving model using the formulation as optimization problem, which can solve conjunction instances (cf. Remark 4). Separate indicates that we solve the verification problem as optimization problem, but conjunction instances are split into several disjunction instances as explained in Remark 4. Conjunction instances have to be splitted likewise for BaB and Reluplex. Based on the results of our evaluation experiments in Sect. 8.2, we choose the configuration "no_heur_sepa0_freq5" as the best for the ACAS instances. Though, in order for a good performance on refutable instances, we do adapt this configuration to execute the primal heuristic also locally up to a depth of eight in the branch-and-bound tree. Table 4 contains the runtime results for all ACAS instances of Properties 1, 2, 3, 4, and 8 as defined by Katz et al. [17]. It should be noted that these are all disjunction instances. The mean runtime of Reluplex is at least one order of magnitude larger than the mean runtimes of all other solvers. ReluVal is clearly superior to all other solvers, especially if its adversary check mode is applied. We see that our solving model, which uses SCIP [13] to strongly integrate the bound computations into a MIP framework, performs significantly better than the rather similar approach of Bunel et al. [5].
A similar picture of the performance of the various solvers can be seen in Table 5, which shows the results on the ACAS instances of Properties 5,6,7,9,and 10. Remind that we underestimate the runtime of instances for which the solving process is stopped due to the time limit.
In Tables 6 and 7 we report runtimes for the MNIST instances. We remind that all of these are conjunction instances by definition. For our solving model we use the configuration "mnist_base" as presented in Sect. 8.2. The instances which are based on neural networks with two layers of 24 neurons are solved very quickly by most solvers, as can be seen in Table 6. Though, domain branching is apparently not a good strategy to solve these instances, which feature 784 input neurons. This is shown by the high number of timeouts of the solving method BaB [5] on the MNIST instances in Table 6.
Clearly, the MNIST instances with 512 neurons per layer are much more challenging. In Table 7 we see that our solving model performs quite good when the approach Joint is taken, i.e. conjunction instances are solved directly as one optimization problem. In several cases Neurify aborts the solving process with no result. For the corresponding instances we assume the time limit of 7200 seconds as runtime in order to compute the mean runtime. If the conjunction instances are split into separate disjunction instances, our solving model mostly fails to find counterexamples within the time limit (see NonOpt and Separate). This can be explained by the fact that the instances, which are obtained after splitting, are solved Table 5 Runtimes on ACAS properties If Reluval is run in normal check mode, it terminates prematurely on Property 7 due to a lack of memory. We use the time limit of 7200 seconds for the mean computation in this case Table 6 Runtimes on the MNIST 24 data set The last number of the instance name is the perturbation radius. In contrast to all other solvers, Neurify reports SAT for instance mnist_24_image11_10. However, all counterexamples that Neurify produces, are only valid when applying a large numerical tolerance of 10 −3 . We regard the instance mnist_24_image11_10 as verifiable and exclude it from the mean computation. Counterexamples produced by Reluplex are valid with a numerical tolerance of 10 −5 sequentially one after the other. However, if a verifiable instances is processed first, this may already lead to a timeout. Reluplex is only able to solve two of the easiest instances within the time limit, and BaB also performs considerably worse than Neurify and our solving model. Though for all MNIST instances, it should be noted that the counterexamples which are produced by Neurify are only feasible if one applies a large numerical tolerance of 10 −3 . It is clear to see, that the instances with the lowest perturbation radius of 1 (in L ∞ norm) are easy to verify. On the other hand, for the instances with a high perturbation radius of 10 or 20, counterexamples are found in most cases. Apparently, the perturbation radius 5 poses the biggest difficulties for the solvers, as the corresponding instances are probably on the borderline between SAT and UNSAT.
In the "Appendix" we report our self defined instances where the input polytope X is not a box and computational results of our solving model on these, as the other solvers cannot process these instances. See Table 12 for the results which are obtained with the formulation as optimization problem, and Table 13 for the results corresponding to the formulation as feasibility problem.

Conclusions
Our solving model shows a solid performance in all categories of our benchmark set. This highlights the success of our approach, which combines MIP solving, using the solver SCIP [13], with specialized bound computation and branching techniques. Especially, we can solve instances with general polytopes as input domains which is not possible with other verification algorithms [5,17,36]. Moreover, our solving model clearly outperforms the solvers of Bunel et al. [5] and Katz et al. [17]. Subsequently, Reluplex cannot be regarded as a state-of-the-art solver for neural network verification anymore. While ReluVal and Neurify from Wang et al. [36] show impressive runtime results for most instances, they rely primarily on branching. For big instances, this can become problematic due to limited numerical accuracy and memory capacity.
Moreover, we developed a theoretical framework for the comparison of linear approximation methods and presented the novel approximation technique OBBT2 which is able to improve the linear relaxation of a ReLU neural network. Besides that, we showed how the local search procedure of Dutta et al. [8] can be implemented differently in an MIP solving context such that it serves as a primal heuristic. Additionally, we described a novel formulation of the verification problem as quadratic program. All newly proposed techniques were evaluated computationally within our newly implemented solving model. Although our implementation is limited to neural networks with ReLU activation function, the approach could be easily extended to work with other piecewise-linear activation functions. Examples for that are the leaky ReLU function or max-pooling layers. In addition, the great flexibility of our solving model allows the integration of further techniques such that future improvements may render it even more efficient.
In conclusion, it can be said that many challenges remain in the field of neural network verification. The further scalability of current verification approaches is still an open task. Besides, new approaches for the falsification of incorrect properties would be highly interesting. Although we present a new heuristic for that (based on ideas of Dutta et al. [8]), better algorithms could probably be developed.

Table 7
Runtimes on the MNIST 512 data set, where the last number of the instance name is the perturbation radius

4830.5
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.
Input constraints for (i) 1_1_int_away and (ii) 1_1_int_away2. Desired output: (i) strong left is not minimal, or (ii) strong right is not minimal.
Input constraints for (i) 1_2_int_away and (ii) 1_2_int_away2. Desired output: (i) strong left is not minimal, or (ii) strong right is not minimal.
Input constraints for 2_1_var_dist and 3_1_var_dist. Desired output: COC is minimal.

Appendix B Tables and Figures
See Fig. 8 and Tables 8,9,10,11,12 and 13 Appendix C Proofs Definition 4 Let S ⊂ R 2n . Then, for i ∈ [n], we denote the embedded image (in R 2 ) of the orthogonal projection of S on the subspace span{e i , e i+n } as S i .

Lemma 1
Let P ⊂ R n be a ReLU proper polytope and Q ⊂ R 2n an independent ReLU approximation of P. Then it holds for all i ∈ [n] and allx ∈ P: Proof As in Definition 2, we use For the first part we see where (8) holds due to the definition of Q and the orthogonal projection, and (9) is rewritten. (10) holds, since the set is determined only by the values ofx i and y i . The values of x j and y j for j ∈ [n], j = i are not relevant for the set and therefore we do not need any constraints on these. Hence we can restrict the inequality in the set to the submatrices which are relevant for x i and y i . In (11) it suffices to considerx because all columns except the i-th column of A (i) are zero. We can now switch back to the original inequality in (12), as this affects only x j and y j for j ∈ [n], j = i. Then we can write the set as an intersection (13) and apply the definition of Q (14).
For the second part of the lemma, we stress that for (x, y) T ∈ Q each y k , k ∈ [n] is independent of the values of all other y l , l = k. This is an immediate consequence of the "IA" stands for interval arithmetic, i.e. symbolic IA is the approach of Wang et al. [35]. "Symbolic equations" refers to the improved method of Wang et al. [36]. We evaluate OBBT on the LP relaxation as well as on the MIP directly, and also our new technique OBBT2 with two different parameter settings. For OBBT on the MIP model, a time limit of five seconds is set for each MIP which is solved. In fact, the instances "property9_property_0" and "property9_property_4" differ only in the property to be verified, which explains the coinciding numbers See Table 8 for an explanation of the column names OBBT2 clearly reduces the number of solving nodes, which is computed as shifted geometric mean with shift value 10 (as the runtime mean values). We regard only those 14 instances of our UNSAT evaluation set, which are solved within the time limit by all methods. If the execution of a method is stopped due to the time limit, it is not reasonable to compare the number of solving nodes between different methods  Table 11 continued In adversary check mode, Reluval fails on property2_5_3 and reports UNSAT instead of SAT. We set the runtime to 7200 s for the mean computation in this case. We also assume a runtime of 7200 s for those three cases, for which Reluplex reports counterexamples that are not even valid with a numerical tolerance of 10 −3 . These are denoted as "wrong" in the table In the column status, "optimal" means that the problem was solved to optimality, while "bound" implies that the solving process was interrupted due to a positive dual or negative primal bound Using this formulation, there are no meaningful primal and dual bounds. The solution status shows that either infeasibility is detected, or a feasible, i.e. optimal, solution is found. Here we only show results for the disjunction instances in the test set definition of an independent ReLU approximation. Becausex ∈ P is fixed, we can find a lower and upper bound for each y k , k ∈ [n] which define the feasible range. Of course, these bounds can be infinite. Thus, we obtain (7).
Theorem 3 Let P ⊂ R n be a ReLU proper polytope and Q E be the approximation of P as in Definition 3. For any independent ReLU approximation Q of P it holds Q E ⊆ Q.
Proof Let i ∈ [n] be fixed and Q be an independent ReLU approximation of P. We define the set C i := (x i , y i ) | x ∈ P, y i = max{0, x i } and let Q i be the embedded image (in R 2 ) of the projection of Q on the variables x i and y i . We claim that it holds conv(C i ) ⊆ Q i . By definition of Q we have C i ⊆ Q i and Q i is a polyhedron, hence convex and therefore the claim holds. In the first part of this proof, we show that conv(C i ) = Q E i . To this end, we set where A (i) , B (i) and c (i) are defined according to Definition 3 and for our fixed index i. Obviously, we have Q E ⊆Q E which implies Q E i ⊆Q E i . We will show thatQ E i = conv(C i ), which allows us to conclude that conv(C i ) ⊆ Q E i ⊆Q E i = conv(C i ).
Let l i := min x∈P x i and u i := max x∈P x i . Then we find which is a direct consequence of the convexity of P. Because P is ReLU proper we have l i < 0 < u i . Thus we can write Hence, C i is the union of two one-dimensional polytopes and conv(C i ) has three vertices at coordinates (l i , 0), (0, 0) and (u i , u i ). These are exactly the vertices of the polytopes that constitute C i . Definition 3 establishes A (i) , B (i) and c (i) such that they imply the following constraints for (x, y) T ∈Q E : The segments between the vertices of conv(C i ) induce the following lines: (l i , 0), (0, 0) y i = 0 (0, 0), (u i , u i ) Taking the respective third vertex into account, we obtain exactly the constraints ofQ E and hence show thatQ E i = conv(C i ). Since we find that conv(C i ) = Q E i . Now, we will combine this result with Lemma 1 in order to prove the theorem.
To this end, we fix an arbitraryx ∈ P. Combining the last result with (6) and applying it to Q E , we obtain: Using conv(C i ) ⊆ Q i , which we showed in the beginning of the proof, and (6) applied to Q, gives

Thus we obtain
As a consequence of (7) in Lemma 1 we have is entirely determined by the projections for all i ∈ [n]. The same holds for Q E instead of Q. Since we fixed i ∈ [n] arbitrarily in the beginning, with (15) we are now able to conclude that Now fix an arbitrary (x,y) ∈ Q E . Sincex ∈ P was also arbitrary, we have which implies (x,y) ∈ Q. This proves Q E ⊆ Q.