Improving Neural Network Verification through Spurious Region Guided Refinement

We propose a spurious region guided refinement approach for robustness verification of deep neural networks. Our method starts with applying the DeepPoly abstract domain to analyze the network. If the robustness property cannot be verified, the result is inconclusive. Due to the over-approximation, the computed region in the abstraction may be spurious in the sense that it does not contain any true counterexample. Our goal is to identify such spurious regions and use them to guide the abstraction refinement. The core idea is to make use of the obtained constraints of the abstraction to infer new bounds for the neurons. This is achieved by linear programming techniques. With the new bounds, we iteratively apply DeepPoly, aiming to eliminate spurious regions. We have implemented our approach in a prototypical tool DeepSRGR. Experimental results show that a large amount of regions can be identified as spurious, and as a result, the precision of DeepPoly can be significantly improved. As a side contribution, we show that our approach can be applied to verify quantitative robustness properties.


Introduction
In the seminal work [34], deep neural networks (DNN) have been successfully applied in Go to play against expert humans. Afterwards, they have achieved exceptional performance in many other applications such as image, speech and audio recognition, selfdriving cars, and malware detection. Despite the success of solving these problems, DNNs have also been shown to be often lack of robustness, and are vulnerable to adversarial samples [39]. Even for a well-trained DNN, a small (and even imperceptible) perturbation may fool the network. This is arguably one of the major obstacles when we deploy DNNs in safety-critical applications like self-driving cars [42], and medical systems [33].
It is thus important to guarantee the robustness of DNNs for safety-critical applications. In this work, we focus on (local) robustness, i.e., given an input and a manipulation region around the input (which is usually specified according to a certain norm), we verify that a given DNN never makes any mistake on any input in the region. The first work on DNN verification was published in [30], which focuses on DNNs with sigmoid activation functions with a partition-refinement approach. In 2017, Katz et al. [20] and Ehlers [10] independently implemented Reluplex and Planet, two SMT solvers to verify DNNs with the ReLU activation function on properties expressible with SMT constraints. Since 2018, abstract interpretation has been one of the most popular methods for DNN verification in the lead of AI 2 [13], and subsequent works like [36,37,23,1,35,28,24] have improved AI 2 in terms of efficiency, precision and more activation functions (like sigmoid and tanh) so that abstract interpretation based approach can be applied to DNNs of larger size and more complex structures.
Among the above methods, DeepPoly [37] is a most outstanding one regarding precision and scalability. DeepPoly is an abstract domain specially developed for DNN verification. It sufficiently considers the structures and the operators of a DNN, and it designs a polytope expression which not only fits for these structures and operators to control the loss of precision, but also works with a very small time overhead to achieve scalability. However, as an abstraction interpretation based method, it provides very little insight if it fails to verify the property. In this work, we propose a method to improve DeepPoly by eliminating spurious regions through abstraction refinement. A spurious region is a region computed using abstract semantics, conjuncted with the negation of the property to be verified. This region is spurious in the sense that if the property is satisfied, then this region, although not empty, does not contain any true counterexample which can be realized in the original program. In this case, we propose a refinement strategy to rule out the spurious region, i.e., to prove that this region does not contain any true counterexamples.
Our approach is based on DeepPoly and improves it by refinement of the spurious region through linear programming. The core idea is to intersect the abstraction constructed by abstract interpretation with the negation of the property to generate a spurious region, and perform linear programming on the constraints of the spurious region so that the bounds of the ReLU neurons whose behaviors are uncertain can be tightened. As a result, some of these neurons can be determined to be definitely activated or deactivated, which significantly improves the precision of the abstraction given by abstract interpretation. This procedure can be performed iteratively and the precision of the abstraction are gradually improved, so that we are likely to rule out this spurious region in some iteration. If we successfully rule out all the possible spurious regions through such an iterative refinement, the property is soundly verified. Our method is similar in spirit to counterexample guided abstraction refinement (CEGAR) [6], i.e., we apply abstract interpretation for abstraction and linear programming for refinement. A fundamental difference is that we use the constraints of the spurious region, instead of a concrete counterexample (which is challenging to construct in our setting), as the guidance of refinement.
The same spurious region guided refinement approach is also effective in quantitative robustness verification. Instead of requiring that all inputs in the region should be correctly classified, a certain probability of error in the region is allowed. Quantitative robustness is more realistic and general compared to the ordinary robustness, and a DNN verified against quantitative robustness is useful in practice as well. The spurious region guided refinement approach naturally fits for this setting, since a comparatively precise over-approximation of the spurious region implies a sound robustness confidence. To the best of our knowledge, for DNNs, this is the first work to verify quantitative robustness with strict soundness guarantee, which distinguishes our approach from the previous sampling based methods like [45,46,3].
In summary, our main contributions are as follows: -We propose spurious region guided refinement to verify robustness properties of deep neural networks. This approach significantly improves the precision of Deep-Poly and it can verify more challenging properties than DeepPoly. -We implement the algorithms as a prototype and run them on networks trained on popular datasets like MNIST and ACAS Xu. The experimental results show that our approach significantly improves the precision of DeepPoly in successfully verifying much stronger robustness properties (larger maximum radius) and determining the behaviors of a great proportion of uncertain ReLU neurons. -We apply our approach to solve quantitative robustness verification problem with strict soundness guarantee. In the experiments, we observe that, comparing to using only DeepPoly, the bounds by our approach can be up to two orders of magnitudes better in the experiments.
Organisations of the paper. We provide preliminaries in Section 2. DeepPoly is recalled in Section 3. We present our overall verification framework and the algorithm in Section 4, and discuss quantitative robustness verification in Section 5. Section 6 evaluates our algorithms through experiments. Section 7 reviews related works and concludes the paper.

Preliminaries
In this section we recall some basic notions on deep neural networks, local robustness verification, and abstract interpretation. Given a vector x ∈ R m , we write x i to denote its i-th entry for 1 ≤ i ≤ m.

Robustness verification of deep neural networks
In this work, we focus on deep feedforward neural networks (DNNs), which can be represented as a function f : R m → R n , mapping an input x ∈ R m to its output y = f (x) ∈ R n . A DNN f often classifies an input x by obtaining the maximum dimension of the output, i.e., arg max 1≤i≤n f (x) i . We denote such a DNN by . . , n} is the set of classification classes. A DNN has a sequence of layers, including an input layer at the beginning, followed by several hidden layers, and an output layer in the end. The output of a layer is the input of the next layer. Each layer contains multiple neurons, the number of which is known as the dimension of the layer. The DNN f is the composition of the transformations between layers. Typically an affine transformation followed by a non-linear activation function is performed. For an affine transformation y = Ax + b, if the matrix A is not sparse, we call such a layer fully connected. A DNN with only fully connected layers and activation functions is a fully connected neural network (FNN). In this work, we focus on the rectified linear unit (ReLU) activation function, defined as ReLU(x) = max(x, 0) for x ∈ R. Typically, a DNN verification problem is defined as follows: Given a DNN f : R m → R n , a set of inputs X ⊆ R m , and a property P ⊆ R n , we need to determine whether f (X) := {f (x) | x ∈ X} ⊆ P holds.
Local robustness describes the stability of the behaviour of a normal input under a perturbation. The range of input under this perturbation is the robustness region. For a DNN C f (x) which performs classification tasks, a robustness property typically states that C f outputs the same class on the robustness region.
There are various ways to define a robustness region, and one of the most popular ways is to use the L p norm. For x ∈ R m and 1 ≤ p < ∞, we define the L p norm of to represent a (closed) L p ball for x ∈ R m and r > 0, which is a neighbourhood of x as its robustness region. If we set X =B p (x, r) and P = {y ∈ R n | arg max i y i = C f (x)} in Def. 1, it is exactly the robustness verification problem. Hereafter, we set p = ∞.

Abstract interpretation for DNN verification
Abstract interpretation [7] is a static analysis method and it is aimed to find an overapproximation of the semantics of programs and other complex systems so as to verify their correctness. Generally we have a function f : R m → R n representing the concrete program, a set X ⊆ R m representing the property that the input of the program satisfies, and a set P ⊆ R n representing the property to verify. The problem is to determine whether f (X) ⊆ P holds. However, in many cases it is difficult to calculate f (X) and to determine whether f (X) ⊆ P holds. Abstract interpretation uses abstract domains and abstract transformations to over-approximate sets and functions so that an overapproximation of the output can be obtained efficiently.
Now we have a concrete domain C, which includes X as one of its elements. To make computation efficient, we need an abstract domain A to abstract elements in the concrete domain. We assume that there is a partial order ≤ on C and A, which in our settings is the subset relation ⊆. We also have a concretization function γ : A → C which assigns an abstract element to its concrete semantics, and γ(a) is the least upper bounds of the concrete elements that can be soundly abstracted by a ∈ A. Naturally a ∈ A is a sound abstraction of c ∈ C if and only if c ≤ γ(a).
The design of an abstract domain is one of the most important problems in abstract interpretation because it determines the efficiency and precision. In practice, we use a certain type of constraints to represent the abstract elements in an abstract domain. Classical abstract domains for Euclidean spaces include Box, Zonotope [14,15], and Polyhedra [38].
Not only do we need abstract domains to over-approximate sets, but we are also required to adopt over-approximation to functions. Here we consider the lifting of the function f : f . When we have a sound abstraction X # ∈ A of X and a sound abstract transformer ⊆ P , the property P is successfully verified. Obviously, verification through abstract interpretation is sound but not complete. Hereafter, we write f # to represent T # f for simplicity. AI 2 [13] first adopted abstract interpretation to verify DNNs, and many subsequent works like [36,37,23] focused on improving its efficiency and precision through, e.g., defining new abstract domains. As a deep neural network, the function f : R m → R n can be regarded as a composition f = f l • · · · • f 1 of its l + 1 layers, where f j performs the transformation between the j-th and the (j + 1)-th layer, i.e., it can be an affine transformation, or a ReLU operation. If we choose Box, Zonotope, or Polyhedra as the abstract domain, then for linear transformations and the ReLU functions, their abstract transformers have been developed in [13]. After we have abstract transformers f # j for these f j , we can conduct abstract interpretation layer by layer as

A Brief Introduction to DeepPoly
Our approach relies on the abstract domain DeepPoly [37], which is the state-of-the-art abstract domain for DNN verification. It defines the abstract transformers of multiple activation functions and layers used in DNNs. The core idea of DeepPoly is to give every variable an upper and a lower bound in the form of an affine expression using only variables that appear before it. It can express a polyhedron globally. Moreover, experimentally, it often has better precision than Box and Zonotope domains. We denote the n-dimensional DeepPoly abstract domain with A n . Formally an abstract element a ∈ A n is a tuple (a ≤ , a ≥ , l, u), where a ≤ and a ≥ give the i-th variable x i a lower bound and an upper bound, respectively, in the form of a linear combination of variables which appear before it, i.e. i−1 k=1 w k x k + w 0 , for i = 1, . . . , n, and l, u ∈ R n give the lower bound and upper bound of each variable, respectively. The concretization of a is defined as The abstract domain A n also requests that its abstract elements a should satisfy the invariant γ(a) ⊆ [l, u]. This invariant helps construct efficient abstract transformers. For an affine transformation By substituting the variables x j appearing in a ≤ i with a ≤ j or a ≥ j according to its coefficient at most i − 1 times, we can obtain a sound lower bound in the form of linear  -If l j ≥ 0 or u j ≤ 0, this ReLU neuron is definitely activated or deactivated, respectively. In this case, this ReLU transformation actually performs an affine transformation, and thus its abstract transformer can be defined as above.
-If l j < 0 and u j > 0, the behavior of this ReLU neuron is uncertain, and we need to over-approximate this relation with a linear upper/lower bound. The best upper bound is uj −lj . For the lower bound, there are multiple choices We choose λ ∈ {0, 1} which minimizes the area of the constraints. Basically we have two abstraction modes here, corresponding to the two choices of λ.
Note that for a DNN with only ReLU as non-linear operators, over-approximation occurs only when there are uncertain ReLU neurons, which are over-approximated using a triangle. The key of improving the precision is thus to compute the bounds of the uncertain ReLU neurons as precisely as possible, and to determine the behaviors of the most uncertain ReLU neurons.
DeepPoly also supports activation functions which are monotonically increasing, convex on (−∞, 0] and concave on [0, +∞), like sigmoid and tanh, and it supports max pooling layers. Readers can refer to [37] for details.

Spurious Region Guided Refinement
We explain the main steps of our algorithm, as depicted in Fig. 1. For the input property and network, we first employ DeepPoly as the initial step to compute f # (X # ). The concretization of f # (X # ) is the conjunction of many linear inequities given in Eq. 1, and for the robustness property P , the negation ¬P is the disjunction of several linear follows the same method as DeepPoly, i.e., we compute the lower bound of y C f (x) − y t and see whether it is larger than 0. In case of yes, it indicates that the label t cannot be classified, as it is dominated by C f (x). Otherwise, we have f # (X # ) ∩ # ¬P = ⊥, we have the conjunction γ(f # (X # ))∧¬P as a potential spurious region, which represents the intersection of the abstraction of the real semantics and the negation of the property to verify. We call such a region spurious because if the property is satisfied, then this region does not contain a true counterexample, i.e., a pair of input and output (x * , y * ) such that y * = f (x * ) and y * violates the property P . In this case, this region is spuriously constructed due to the abstraction of the real semantics, where the counterexamples cannot be realized, and thus we aim to rule out the spurious region. 2. If no potential spurious region is found, our algorithm safely returns yes. 3. Assume now that we have a the potential spurious region. The core idea is to use the constraints of the spurious region to refine this spurious region. Here a natural way to refine the spurious region is linear programming, since all the constraints here are linear inequities. If the linear programming is infeasible, it indicates that the region is spurious, and thus we can return an affirmative result. Otherwise, our refinement will tighten the bounds of variables involved in the DNN, especially the input variables and uncertain ReLU neurons, and these tightened bounds help further give a more precise abstraction. 4. As our approach is based on DeepPoly, similarly, we cannot guarantee completeness. We set a threshold N of the number of iterations as a simple termination condition. If the termination condition is not reached, we run DeepPoly again, and return to the first step.
Below we give an example, illustrating how refinement can help in robustness verification.
We fail to verify the property in Example 1 because for the uncertain ReLU relation y 1 = ReLU(x 3 ), the abstraction is imprecise, and the key to making the abstraction more precise here is to obtain as tight a bound as possible for x 3 .

Example 2.
We use the constraints in Fig. 2(a) and additionally the constraint y 2 −y 1 ≤ 0 (i.e., ¬P ) as the input of linear programming. Our aim is to obtain a tighter bound of the input neurons x 1 and x 2 , as well as the uncertain ReLU neuron x 3 , so the objective functions of the linear programming are min x i and min −x i for i = 1, 2, 3. All the three neurons have a tighter bound after the linear programming (see the red part in Fig. 2(b)). Fig. 2(b) shows the running of DeepPoly under these new bounds, where the input range and the abstraction of the uncertain ReLU neuron are both refined. Now the lower bound of y 2 − y 1 is 0.25, so DeepPoly successfully verifies the property.

Main algorithm
Alg. 1 presents our algorithm. First we run abstract interpretation to find the uncertain neurons and the spurious regions (Line 2-5). For each possible spurious region, we have a while loop which iteratively refines the abstraction. In each iteration we perform linear programming to renew the bounds of the input neurons and uncertain ReLU neurons; when we find that the bound of an uncertain ReLU neuron becomes definitely nonnegative or non-positive, then the ReLU behavior of this neuron is renewed (Line 14-20). We use them to guide abstract interpretation in the next step (Line 21-22). Here in Line 22, we make sure that during the abstract interpretation, the abstraction of previous uncertain neurons (namely the uncertain neurons before the linear programming step in the same iteration) compulsorily follows the new bounds and new ReLU behaviors given by the current C ≥0 , C ≤0 , l, and u, where these bounds will not be renewed by abstract interpretation, and the concretization of Y is defined as The while loop ends when (i) either we find that the spurious region is infeasible (Line 11, 24) and we proceed to refine the next spurious region, with a label Verified True, (ii) or we reach the terminating condition and fail to rule out this spurious region, in which case we return UNKNOWN. If every while loop ends with the label Verified True, we successfully rule out all the spurious regions and return YES. An observation is that, if some spurious regions have been ruled out, we can add the constraints of their negation to make the current spurious region smaller so as to improve the precision (Line 9).
Here we discuss the soundness of Alg. 1. We focus on the while loop and claim that it has the following loop invariant: The abstract element Y over-approximates the intersection of the semantics of f onB ∞ (x, r) and the spurious region, i.e., f (B ∞ (x, r)) ∩ Spu ⊆ γ(Y ).

Algorithm 1 Spurious region guided robustness verification
Input: DNN f , input x, radius r.
The box X is obtained by linear programming on Y ∧ Spu, and f # (X) is calculated through abstract interpretation and the bounds given by linear programming on Y ∧ Spu, and thus it remains an over-approximation. It is worth mentioning that, when we run DeepPoly in Line 22, we are using the bounds obtained by linear programming to guide DeepPoly, and this may violate the invariant γ(a) ⊆ [l, u] mentioned in Sect. 3. Nonotheless, soundness still holds since the concretization of Y is newly defined in Eq. 2, where both items in the intersection over-approximate f (B ∞ (x, r)) ∩ Spu. With Invarient 1, Alg. 1 returns YES if for any possible spurious region Spu, the overapproximation of f (B ∞ (x, r)) ∩ Spu is infeasible, which implies the soundness of Alg. 1.

Iterative refinement of the spurious region
Here we present more theoretical insight on the iterative refinement of the spurious region. An iteration of the while loop in Alg. 1 can be represented as a function L : A → A, where A is the DeepPoly domain. An interesting observation is that, the abstract transformer f # in the DeepPoly domain is not necessarily increasing, because different input ranges, even if they have inclusion relation, may lead to different choices of the abstraction mode of some uncertain ReLU neurons, which may violate the inclusion relation of abstraction. We have found such examples during our experiment, which is illustrated in the following example. .
, which implies that the transformer f # is not increasing.
This fact also implies that L is not necessarily increasing, which violates the condition of Kleene's Theorem on fixed point [4]. Now we turn to the analysis of the sequence Lemma 1 implies that if our sequence {Y k } is decreasing, then the iterative refinement converges to an abstract element in DeepPoly, which is the greatest fixed point of L that is smaller than f # (B ∞ (x, r)). A sufficient condition for {Y k } being decreasing is that during the abstract interpretation in every Y k , every initial uncertain neuron maintains its abstraction mode, i.e. its corresponding λ does not change, before its ReLU behavior is determined. A weaker sufficient condition for convergence is that change in abstraction mode of uncertain neurons never happens after finitely many iterations.
If the abstraction mode of uncertain neurons changes infinitely often, generally the sequence {Y k } does not converge. In this case, we can consider its subsequence in which every Y k is obtained with the same abstraction mode. It is easy to see that such a subsequence must be decreasing and thus have a meet, as it is an accumulative point of the sequence {Y k }. Since there are only finitely many choices of abstraction modes, such a accumulative points exists in {Y k }, and there are only finitely many accumulative points. We conclude these results in the following theorem which describes the convergence behavior of our iterative refinement of the spurious region: Proof. Since the abstraction modes of uncertain ReLU neurons have only finitely many choices, there must be one which happens infinitely often in the computation of the sequence {Y k }, and we choose the subsequence {Y n k } in which every item is computed through this abstraction mode. Obviously {Y n k } is decreasing and thus has a meet. For a decreasing subsequence {Y n k }, we can find its subsequnce in which the abstraction mode of uncertain ReLU neurons does not change, and they have the same meet. Since there are only finitely many choices of abstraction modes of uncertain ReLU neurons, such accumulative points of {Y k } also have finitely many values. If exact one abstraction mode of uncertain ReLU neurons happens infinitely often, obviously there is only one accumulative point in {Y k }.

Optimizations
In the implementation of our main algorithm, we propose the following optimizations to improve the precision of refinement.
Optimization 1: More precise constraints in linear programming. In Line 15 of Alg. 1, it is not the best choice to take the linear constraints in the abstract element Y into linear programming, because the abstraction of uncertain ReLU neurons in DeepPoly is not the best. Planet [10] has a component which gives a more precise linear approximation for uncertain ReLU relations, where it uses the linear constraints y ≤ u(x−l) u−l , y ≥ x, y ≥ 0 to over-approximate the relation y = ReLU(x) with x ∈ [l, u].
Optimization 2: Priority to work on small spurious regions. In Line 6 of Alg. 1,we determine the order of refining the spurious regions based on their sizes, i.e., a smaller region is chosen earlier. This is based on the intuition that Alg. 1 works effectively if the spurious region is small. After the small spurious regions are ruled out, the constraints of large spurious regions can be tightened with the conjunction It is difficult to strictly determine which spurious region is the smallest, and thus we refer to the lower bound of y C f (x) − y ti given by DeepPoly, i.e., the larger this lower bound is, the smaller the spurious region is likely to be, and we perform the for loop in Line 6 of Alg. 1 in this order.

Quantitative Robustness Verification
In this section we recall the notion of quantitative robustness and show how to verify a quantitative robustness property of a DNN with spurious region guided refinement.
In practice, we may not need a strict condition of robustness to ensure that an input x is not an adversarial example. A notion of mutation testing is proposed in [44,43], which requires that an input x is normal if it has a low label change rate on its neighbourhood. They follow a statistical way to estimate the label change rate of an input, which motivates us to give a formal definition of the property showing a low label change rate, and to consider the verification problem for such a property. Below we recall the definition of quantitative robustness [27], where we have a parameter 0 < η ≤ 1 representing the confidence of robustness.
Def. 2 has a tight association with label change rate, i.e., if x is η-robust, then the label change rate should be smaller than, or close to 1 − η. Hereafter, we set μ to be the uniform distribution onB ∞ (x, r).
It is natural to adapt spurious region guided refinement to quantitative robustness verification. In Alg. 1, we do not return UNKNOWN when we cannot rule out a spurious region, but record the volume of the box X as an over-approximation of the Lebesgue measure of the spurious region. After we work on all the spurious regions, we calculate the sum of these volume, and obtain a sound robustness confidence. Here we do not calculate the volume of the spurious region because precise calculation of volume of a high-dimensional polytope remains open, and we do not choose to use randomized algorithms because it may not be sound.
We further improve the algorithm through the powerset technique [13]. Powerset technique is a classical and effective way to enhance the precision of abstract interpretation. We split the input region into several subsets, and run abstract interpretation on these subsets, In our quantitative robustness verification setting, powerset technique not only improves the precision, but also accelerates the algorithm in some situations: If the subsets have the same volume, and the percentage of the subsets on which we may fail to verify robustness is already smaller than 1 − η, then we have successfully verified the η-robustness property.

Experimental Evaluation
We implement our approach as a prototype called DeepSRGR. The implementation is based on a re-implementation of the ReLU and the affine abstract transformers of DeepPoly in Python 3.7 and we amend it accordingly to implement Alg. 1. We use CVXPY [8] as our modeling language for convex optimization problems and CBC [18] as the LP solver. It is worth mentioning that we ignore the floating point error in our re-implementation of DeepPoly because sound linear programming currently does not scale in our experiments. In the terminating condition, we set N = 5. The two optimizations in Sect. 4.3 are adopted in all the experiments. All the experiments are conducted on a CentOS 7.7 server with 16 Intel Xeon Platinum 8153 @2.00GHz (16 cores) and 512G RAM, and they use 96 sub-processes concurrently at most. Readers can find all the source code and other experimental materials in https://iscasmc.ios.ac. cn/ToolDownload/?Tool=DeepSRGR.
Datasets. We use MNIST [22] and ACAS Xu [12,17] as the datasets in our experiments. MNIST contains 60 000 grayscale handwritten digits of the size 28 × 28. We can train DNNs to classify the images by the written digits on them. The ACAS Xu system is aimed to avoid airborne collisions for unmanned aircrafts and it uses an observation table to make decisions for the aircraft. In [19], the observation table is realized by training DNNs instead of storing it.
Networks. On MNIST, we trained seven fully connected networks of the size 6 × 20, 3 × 50, 3 × 100, 6 × 100, 6 × 200, 9 × 200, and 6 × 500, where m × n refers m hidden layers and n neurons in each hidden layer, and we name them from FNN2 to FNN8, respectively (we also have a small network FNN1 for testing). On ACAS Xu, we randomly choose three networks used in [20], all of the size 6 × 50.

Improvement in precision
First we compare DeepPoly and DeepSRGR in terms of their precision of robustness verification. We consider the following two indices: (i) the maximum radius that the two tools can verify, and (ii) the number of uncertain ReLU neurons whose behaviors can be further determined by DeepSRGR. For each network, we randomly choose three images from the MNIST dataset, and calculate their maximum radius that the two tools can verify through a binary search on the seven FNNs. In column "# uncertin ReLU" we record the number of the uncertain ReLU neurons when first applying DeepPoly, and also count how many of them are renewed, namely become definitely activated/deactivated in later iterations when applying DeepSRGR. Table 1 shows the results. We can see from Table 1 that DeepSRGR can verify much stronger (i.e., larger maximum radius) robustness properties than DeepPoly. The average number of iterations for ruling out a spurious region is 2.875, and about half of the spurious regions can be ruled out within 2 iterations. DeepSRGR sometimes determines behaviors of a large proportion of uncertain ReLU neurons on large networks: Considering the last picture of the most challenging network FNN8, more than ninety percent (92.6% ≈ 1269 1371 ) of the uncertain neurons are renewed. Improvement in precision evaluated in this experiment works for verification of both robustness and quantitative robustness, and this is why our method is effective in both tasks.

Robustness verification performance
In this setting, we randomly choose 50 samples from the MNIST dataset. We fix four radii, 0.037, 0.026, 0.021, and 0.015 for the four networks FNN4 -FNN7 respectively, and verify the robustness property with the corresponding radius on the 50 inputs. The radius chosen here is very challenging for the corresponding network.   We observe that, by increasing the termination threshold N from 5 to 50, only two more properties out of 15 can be verified additionally. This suggests that our method can effectively identify these spurious regions which are relevant to verification of the property, in a small number of iterations.

Quantitative robustness verification on ACAS Xu networks
We evaluate DeepSRGR for quantitative robustness verification on ACAS Xu networks. We randomly choose five inputs, and compute the maximum robustness radius for each input on the three networks with DeepPoly through a binary search. In our experiment, the radius for a running example is the maximum robustness radius plus 0.02, 0.03, 0.04, 0.05, and 0.06. We use the powerset technique and the number of splits is 32. For DeepPoly, the robustness confidence it gives is the proportion of the splits on which DeepPoly verifies the property. Fig. 4 shows the results. We can see that DeepSRGR gives significantly better overapproximation of 1−η than DeepPoly. That is, in more than 90% running examples, our over-approximation is no more than one half of that given by DeepPoly, and in more than 75% of the cases, our over-approximation is even smaller than one tenth of that given by DeepPoly.

Related Works and Conclusion
We have already discussed papers mostly related to our paper. Here we add some more new results. Marabou [21] has been developed as the next generation of Reluplex. Recently, verification approach based on abstraction of DNN models has been proposed in [11,2]. In addition, alternative approaches based on constraint-solving [26,29,5,25], layer-by-layer exhaustive search [16], global optimization [31,9,32], functional approximation [47], reduction to two-player games [48,49], and star set abstraction [41,40] have been proposed as well.
In this work, we propose a spurious region guided refinement approach for robustness and quantitative robustness verification of deep neural networks, where abstract interpretation calculates an abstraction, and linear programming performs refinement with the guidance of the spurious region. Our experimental results show that our tool can significantly improve the precision of DeepPoly, verify more robustness properties, and often provide a quantitative robustness with strict soundness guarantee.
Abstraction interpretation based framework is quite extensive to different DNN models, different properties, and incorporate different verification methods. As future work, we will investigate how to increase the precision further by using more precise linear over-approximation like [35].