Robustness verification of ReLU networks via quadratic programming

Kuvshinov, Aleksei; Günnemann, Stephan

doi:10.1007/s10994-022-06132-9

Robustness verification of ReLU networks via quadratic programming

Open access
Published: 16 March 2022

Volume 111, pages 2407–2433, (2022)
Cite this article

Download PDF

You have full access to this open access article

Machine Learning Aims and scope Submit manuscript

Robustness verification of ReLU networks via quadratic programming

Download PDF

2252 Accesses
1 Altmetric
Explore all metrics

Abstract

Neural networks are known to be sensitive to adversarial perturbations. To investigate this undesired behavior we consider the problem of computing the distance to the decision boundary (DtDB) from a given sample for a deep neural net classifier. In this work we present a procedure where we solve a convex quadratic programming (QP) task to obtain a lower bound on the DtDB. This bound is used as a robustness certificate of the classifier around a given sample. We show that our approach provides better or competitive results in comparison with a wide range of existing techniques.

Robustness of Piece-Wise Linear Neural Network with Feasible Region Approaches

Maximal Robust Neural Network Specifications via Oracle-Guided Numerical Optimization

Neural Network Robustness as a Verification Property: A Principled Case Study

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction

The high predictive power of neural network classifiers makes them the method of choice to tackle challenging classification problems in many areas. However, questions regarding the robustness of their performance under slight input perturbations still remain open, severely limiting the applicability of deep neural network classifiers to sensitive tasks that require certification of the obtained results.

In recent years this issue gained a lot of attention, resulting in a large variety of methods tackling tasks ranging from adversarial attacks and defenses against these to robustness verification and robust training. In this work we focus on robustness verification. That is, computing the distance from a given anchor point $x^{0}$ in the input space to its closest adversarial, i.e. a point that is assigned a different class label by the network. This problem plays a fundamental role in understanding the behavior of deep classifiers and essentially provides the only reliable way to assess classifier robustness. Unfortunately, its complexity class does not allow a polynomial time algorithm. For deep classifiers with ReLU activation the verification problem can equivalently be reformulated as a mixed integer programming (MIP) task and was shown to be NP-complete by Katz et al. (2017). Even worse, Weng et al. (2018) showed that an approximation of the minimum adversarial perturbation of a certain (high) quality cannot be found within polynomial time.

Related work There exist two streams of related work on robustness verification of deep ReLU classifiers. This categorization is based on whether they are solving the verification problem exactly or verifying a bound on the distance to the decision boundary (DtDB).

The first group of methods are exact verification approaches. As mentioned above, the verification task can be modeled using MIP techniques. Katz et al. (2017) present a modification of the simplex algorithm that can be used to solve the verification task exactly for smaller ReLU networks based on satisfiable modulo theory (SMT). Other approaches (Ehlers 2017) rely on SMT solvers when solving the described task. Bunel et al. (2018) provide an overview and comparison of those. Other exact methods (Dutta et al. 2018; Lomuscio and Maganti 2017; Tjeng et al. 2017) deploy MIP solvers together with presolving to find a tight formulation of the MIP problem or (Jordan et al. 2018) use an algorithm to find the largest ball around the anchor point that touches the decision boundary.

The second popular class of methods for verifying classifier robustness deals with verification of an $\epsilon$-neighborhood: given an anchor point $x^{0}$ and an $\epsilon >0$, the task is to verify whether an adversarial point exists within the $\epsilon$ neighborhood of $x^{0}$ which is defined with respect to a certain norm in the input space. All existing methods relax the initial problem and require bounds on activation inputs in each layer. These bounds should be as tight as possible to ensure good final results. Raghunathan et al. (2018a, b), Dvijotham et al. (2018, 2019) consider semidefinite (SDP) and linear (LP) problems as relaxations of the $\epsilon$-verification problem. Wong and Kolter (2018) replace ReLU constraints by linear constraints and consider the dual formulation of the obtained LP relaxation. Weng et al. (2018) present an approach that also uses linear functions (later extended to quadratic functions by Zhang et al. 2018) to deal with nonlinear activation functions and propagate the layer-wise output bounds until the final layer. Salman et al. (2019) provide a unifying framework for the approaches using neuron-wise relaxations of the activation functions and use the best possible convex relaxation. Finally, Hein and Andriushchenko (2017), Tsuzuku et al. (2018) use the Lipschitz constant of the transformations within classifier’s architecture.

Our approach belongs to the same group of the inexact verifiers, but deals with constructing lower bounds on DtDB without necessarily restricting admissible adversarial points to a given neighborhood. Croce et al. (2019) leverage the piecewise affine nature of the outputs of a ReLU classifier and compute lower bounds on DtDB by assuming that the classifier behaves globally the same way it does in the linear region around the given anchor point. The $\epsilon$-verification task is closely related to this problem since each $\epsilon$-neighborhood that is certified as adversarial-free immediately provides a lower bound on the minimal adversarial perturbation magnitude. It is also a common strategy for the $\epsilon$-verification methods to use a binary search or a Newton method on top of their algorithm to find the largest $\epsilon$ such that the $\epsilon$-neighborhood around $x^{0}$ is still successfully verified as robust.

Adversarial attacks Constructing misclassified examples that are close to the anchor point can be considered as a complementary research direction to robustness verification since each adversarial example by definition provides an upper bound on the DtDB. Many methods were proposed to construct such points (Szegedy et al. 2014; Goodfellow et al. 2015; Kurakin et al. 2016; Papernot et al. 2016; Madry et al. 2017; Carlini and Wagner 2017).

Robust training The question of how to actually train a robust classifier is closely related to robustness verification since the latter might allow us to construct some type of robust loss based on the insights from the verification procedure (Hein and Andriushchenko 2017; Madry et al. 2017; Wong and Kolter 2018; Raghunathan et al. 2018a; Tsuzuku et al. 2018; Wang et al. 2018; Croce et al. 2019). We leave this direction for future work.

1.1 Contributions

1.
We propose a novel relaxation of the DtDB problem in form of a QP task allowing efficient computation of high quality lower bounds on the DtDB in $l_{2}$-norm with an extension to $l_{\infty}$-norm. We reach state-of-the-art performance for dense and convolutional networks compared to the bounds obtained from methods based on LP relaxations (CROWN by Zhang et al. 2018 and ConvAdv by Wong and Kolter 2018). Furthermore, our method performs much faster than methods based on SDP relaxations (Raghunathan et al. 2018b), while providing smaller lower bounds. This is a fundamental property due to the difference in computational complexity between SDP and QP tasks.
2.
Unlike $\epsilon$-verification techniques, we provide a lower bound on DtDB without an initial guess and without computing bounds for the neuron activation values in each layer. If additional information is present allowing the user to bound the distance to any admissible adversarial point from above, we incorporate these upper bounds in our formulation to verify larger regions around the anchor point. Such bounds have to be tight enough to verify non-trivial neighborhoods and play an important role in other relaxation techniques such as the SDP based approaches by Raghunathan et al. (2018b) and Dvijotham et al. (2019). We describe an efficient search method for pre-activation bounds resulting in larger verified regions based on sequential convex quadratic programming (QP).
3.
To analyze the gap in the optimal objective function value between the initial DtDB problem and our relaxation we establish a connection of DtDB’s dual problem to our QP task. It allows us to deconstruct this gap into two components. Moreover, we discuss how we improve the QP formulation to close the gap to DtDB and how we bound one of its components.

The remainder of this paper is organized as follows. In Sect. 2 we introduce the necessary notation. In Sect. 3.1 we formally define the problem of finding the smallest adversarial perturbation and in Sect. 3.2 introduce its QP relaxation QPRel. There we also formulate the dual DtDB problem as the best convex QP relaxation. In Sect. 3.3 we introduce additional linear constraints using bounds on the region of the admissible points around $x^{0}$ and summarize our verification procedure. In Sect. 4 we compare our approach to the LP- and SDP-based competitors. We summarize our findings in Sect. 5 and discuss the directions for future work.

2 Notation and idea

We consider a neural network consisting of L linear transformations representing dense, convolutional, skip or average pooling layers and $L-1$ ReLU activations (no ReLU after the last hidden layer). The number of neurons in layer l is denoted as $n_{l}$ for $l=0,\ldots ,L$, meaning that the data has $n_{0}$ features and $n_{L}$ classes. Furthermore, we present our analysis for the $l_{2}$-norm as perturbation measure since only few available methods are applicable to this setting. To make our method comparable with the approach by Raghunathan et al. (2018b) we propose a generalization to $l_{\infty} $-setting as well.

Given sample $x^{0}\in {\mathbb {R}}^{n_{0}}$, weight matrices $W^{l}\in {\mathbb {R}}^{n_{l}\times n_{l-1}}$, and bias vectors $b^{l}\in {\mathbb {R}}^{n_{l}}$, we define the output of the ith neuron in the lth layer after the ReLU activation as

$$\begin{aligned}&x^{l}_{i} = \left[ W^{l}_{i}x^{l-1} + b^{l}_{i}\right] _{+}\text { and} \\&f_{i}(x^{0}) = x^{L}_{i} = W^{L}_{i}x^{L-1} + b^{L}_{i}, \end{aligned}$$

(1)

where $\left[ x \right] _{+}$ is the positive part of x and $f(x^{0})=x^{L}$ denotes the output of the complete forward pass through the network. We start with the observation that for each pair of scalars x and y the following holds (also used by Raghunathan et al. 2018b; Dvijotham et al. 2019 for $\epsilon$-verification).

$$x=\left[ y\right] _{+} \Longleftrightarrow x\ge 0,\quad x-y\ge 0,\quad x(x-y)=0.$$

(2)

This relation allows us to obtain an optimization problem with linear complementarity constraints.

3 Verification as an optimization task

3.1 Formulation of DtDB

For a given sample $\tilde{x}^{0}$, pre-trained neural network f, predicted label $\tilde{y}$ and adversarial label y we aim to find the closest point to $\tilde{x}^{0}$ in ${\mathbb {R}}^{n_{0}}$ that has a larger or equal probability of being classified as y compared to the initial label. This task corresponds to the following optimization problem.

$$\min _{x^{0}\in {\mathbb {R}}^{n_{0}}} \Vert x^{0} - \tilde{x}^{0}\Vert ^{2},\quad \text { s.t. } (e_{\tilde{y}} - e_{y})^{T}f(x^{0}) \le 0,$$

(DtDB)

where $e_{i}$ is the ith unit vector in ${\mathbb {R}}^{n_{L}}$ and $\Vert x\Vert$ denotes the Euclidean norm of x. To compute the distance from $\tilde{x}^{0}$ to the (full) decision boundary, one needs to compute the solution for all adversarial labels $y=1,\ldots ,n_{L}$ except $\tilde{y}$. Next we unfold the above optimization problem using (1), where x denotes a container with all variables $x^{0},\ldots ,x^{L}$ and [L] is the set $\{1,\ldots ,L\}$ .

$$\begin{aligned}\min _{x\in {\mathbb {R}}^{n}} \Vert x^{0} - \tilde{x}^{0}\Vert ^{2},\quad \text { s.t. }& (e_{\tilde{y}} - e_{y})^{T}x^{L} \le 0, \quad x^{L} = W^{L}x^{L-1} + b^{L}\\&x^{l} = ReLU(W^{l}x^{l-1} + b^{l})\quad \text { for }l\in [L-1]. \end{aligned}$$

We apply (2) to reformulate the problem and eliminate $x^{L}$, such that from now on $n=n_{0}+\cdots +n_{L-1}$ and x contains only the remaining variables $x^{0},\ldots ,x^{L-1}$.

$$\min _{x\in {\mathbb {R}}^{n}} \Vert x^{0} - \tilde{x}^{0}\Vert ^{2},\quad \text { s.t. } (e_{\tilde{y}} - e_{y})^{T}\left( W^{L}x^{L-1} + b^{L} \right) \le 0,$$

(DtDB)

$$\begin{aligned}&\left( x^{l} \right) ^{T}\left( x^{l} - \left( W^{l}x^{l-1} + b^{l}\right) \right) = 0\quad \text { for }l\in [L-1], \end{aligned}$$

(3)

$$\begin{aligned}&x^{l} - \left( W^{l}x^{l-1} + b^{l}\right) \ge 0,\quad x^{l} \ge 0\quad \text { for }l\in [L-1]. \end{aligned}$$

(4)

3.2 QP relaxation

To get rid of the quadratic equality constraints (3) we consider a Lagrangian relaxation of DtDB:

$$\begin{aligned}\min _{x\in {\mathbb {R}}^{n}} \Vert x^{0} - \tilde{x}^{0}\Vert ^{2} + c(x,\lambda ),\quad \text { s.t. } & (e_{\tilde{y}} - e_{y})^{T}\left( W^{L}x^{L-1} + b^{L} \right) \le 0, \\&x^{l} - \left( W^{l}x^{l-1} + b^{l}\right) \ge 0,\quad x^{l} \ge 0\quad \text { for }l\in [L-1], \end{aligned}$$

(QPRel)

where for arbitrary vectors $x^{0}\in {\mathbb {R}}^{n_{0}},\ldots ,x^{L-1}\in {\mathbb {R}}^{n_{L-1}}$ and $\lambda \in {\mathbb {R}}^{L-1}_{+}$ we define

$$c(x,\lambda ) := \sum _{l=1}^{L-1} \lambda _{{l}}\left( x^{l} \right) ^{T} \left( x^{l} - \left( W^{l}x^{l-1} + b^{l}\right) \right)$$

(5)

as the propagation gap. The obtained problem is indeed a QP with linear constraints. We need to clarify two questions. How does the problem QPRel help us with solving DtDB and how do we solve this problem itself efficiently?

3.2.1 QPRel vs. DtDB

QPRel returns robust radius It follows directly from the definition of the Lagrange relaxation QPRel that for arbitrary non-negative $\lambda$ it holds that:

if x is feasible for DtDB we have $c(x,\lambda )=0$, meaning that x equals the vector obtained by propagating $x^{0}$ through the neural network as defined in (1),
if x is feasible for QPRel then $c(x,\lambda )\ge 0$, meaning that there might be a slack between the true output of layer l when getting $x^{0}$ as an input and the value of $x^{l}$.

In general the following holds for the relation between the solution of QPRel and DtDB (see Fig. 1). We include the proof of Lemma 1 and all other results in “Appendix B”.

Lemma 1

Denote the solution of QPRel by $x_{\text{qp}}$ and the square root of its optimal objective value by $d_{\text{qp}},$ let d be the square root of the optimal objective value of DtDB. The following holds:

1.
$d_{\text{qp}} \le d$ and when $c( x_{\text{qp}}, \lambda ) = 0$ we have $d_{\text{qp}} = d$ and $x_{\text{qp}}$ is optimal for DtDB.
2.
$d_{\text{qp}}$ is monotone with respect to $\lambda ,$ that is for two non-negative $\lambda ^{1}, \lambda ^{2}$ with $\lambda ^{1} \le \lambda ^{2}$ elementwise it holds that $d_{\text{qp}}(\lambda ^{1}) \le d_{\text{qp}}(\lambda ^{2})$.

The first result from Lemma 1 ensures that $d_{\text{qp}}$ provides a radius of a certified region around the anchor point. Whereas the second part indicates that we should choose $\lambda$ as large as possible to get our lower bound closer to DtDB. Unfortunately, as we show below, QPRel becomes non-convex for large values of $\lambda$. While one could try to tackle a non-convex QP with proper optimization methods, we address conditions such that QPRel is guaranteed to be convex and can be solved efficiently next.

Convexity of QPRel To look into the problem QPRel in more detail we introduce the Hessian $M^{\lambda}$ (which is a constant matrix) of its objective function. Let $E_{l}\in {\mathbb {R}}^{n_{l}\times n_{l}}$ be the identity matrix of the corresponding dimension and set $\lambda _{0} = 1$. We define $M^{\lambda} \in {\mathbb {R}}^{n\times n}$ as the symmetric block tridiagonal matrix with blocks $M^{\lambda} _{l,l}=2\lambda _{{l}} E_{l} \text { and } M^{\lambda} _{l,l-1} = -\lambda _{{l}} W^{l}.$ Using this matrix we rewrite the objective function from QPRel as (see “Appendix B”, Lemma 4 for the proof and definition of the terms)

$$\min _{x\in {\mathbb {R}}^{n}} \frac{1}{2} x^{T} M^{\lambda} (W) x + x^{T} B_{1} (b, \lambda , \tilde{x}^{0}) + \Vert \tilde{x}^{0}\Vert ^{2}, \quad \text { s.t. } \bar{M}(W) x - B_{2}(b) \ge 0,$$

(6)

where $B_{1}$ influences only the linear term and is therefore not relevant in this section. From this reformulation we clearly see that the matrix $M^{\lambda}$ determines the (non-)convexity of the objective function. The following theorem provides sufficient and necessary conditions on $\lambda$ depending on the weights $W^{l}$ assuring that $M^{\lambda}$ is positive semi-definite. This allows us to use off-the-shelf QP-solvers with excellent convergence properties.

Theorem 1

Let $W^{1},\ldots ,W^{L-1}$ be the weights of a pre-trained neural network and $\Vert W\Vert$ the spectral norm of an arbitrary matrix. Then the following two conditions for $\lambda$ provide correspondingly a sufficient and a necessary criterion for the matrix $M^{\lambda}$ to be positive semi-definite.

$$\begin{aligned}&\text {(suf. condition)} \quad \lambda _{{1}}\le \frac{2\lambda _{{0}}}{\Vert W^{1}\Vert ^{2}}\quad \text { and }\quad \lambda _{{l}}\le \frac{\lambda _{{l-1}}}{\Vert W^{l}\Vert ^{2}} \quad \text { for }l\ge 2, \end{aligned}$$

(7)

$$\begin{aligned}&\text {(nec. condition)} \quad \lambda _{{l}}\le \frac{4\lambda _{{l-1}}}{\Vert W^{l}\Vert ^{2}} \quad \text { for }l\ge 1. \end{aligned}$$

(8)

Furthermore, we define $\underline{\lambda }$ and $\bar{\lambda }$ that correspondingly satisfy conditions (7) and (8) with equality:

$$\underline{\lambda }_{l} = 2 \prod _{k=1}^{l} \frac{1}{\Vert W^{k}\Vert ^{2}}, \quad \bar{\lambda }_{l} = 4^{l}\prod _{k=1}^{l} \frac{1}{\Vert W^{k}\Vert ^{2}}.$$

(9)

Finally, in case with a single hidden layer $M^{\lambda }$ is positive-semi definite even for $\lambda =\bar{\lambda }$ from (8).

We use (7), (8) and our previous results as guidelines for the choice of $\lambda$. Since $d_{\text{qp}}(\lambda )$ is monotone in the sense of Lemma 1 we perform a binary search between $\underline{\lambda }$ and $\bar{\lambda }$ to find the point closest to $\bar{\lambda }$ (where QP is non-convex for networks with more than one hidden layer) such that the QP remains convex. We denote the obtained $\lambda$ by $\hat{\lambda }$. This preprocessing step does not considerably affect the runtime since checking whether a matrix is positive semi-definite is done efficiently by Cholesky decomposition. However, it significantly improves the final bounds compared to the bounds obtained when using $\lambda =\underline{\lambda }$ from (7).

Note that this procedure has to be done once for a given classifier. $\hat{\lambda }$ is then used to solve QPRel for all anchor points and adversarial labels. This is a significant computational advantage compared to SDP-based $\epsilon$-verification methods. For example, Dvijotham et al. (2019) include the dual multipliers as variables in a relaxation of the SDP problem that has to be solved for each combination of the anchor point, adversarial label and verified epsilon.

Relation to the dual of DtDB Since QPRel is a Lagrangean relaxation of a non-convex quadratically constrained QP DtDB, we unavoidably have a gap between their optimal objective values, but get a simpler problem to solve in return. To investigate and approximate the components of that gap, we look onto the relation of DtDB and QPRel from the perspective of duality theory. A similar question was investigated by Salman et al. (2019) for the existing $\epsilon$-verification methods based on neuron-wise LP-relaxations. However, our method does not fall into this category because the relaxation happens jointly for all layers.

Note, that our formulation of DtDB problem contains quadratic equality constraints (3) and therefore has a non-convex admissible set. For the derivation of its dual problem we refer to the complementary material (see “Appendix B”) and summarize here the most important result.

Theorem 2

Solving the Lagrange dual problem of the non-convex DtDB is equivalent to solving the problem

$$\max _{\lambda \in {\mathbb {R}}^{L-1}_{+}}\text {QPRel}(\lambda ) \text { s.t. }M^{\lambda} \text { is positive semi-definite,}$$

where we slightly redefine the notation and write $\text {QPRel}(\lambda )$ for the optimal objective function value of QPRel for the corresponding $\lambda$. We also denote $\lambda ^{*}$ as the optimal value of $\lambda$ for the above problem.

Now we are ready to formulate the result that provides a way to estimate how large is the difference between the optimal objective function value of QPRel for $\hat{\lambda }$, constructed using Theorem 1, and the optimal $\lambda ^{*}$. The latter is defined by Theorem 2 and would provide the best bound we can get when constraining ourselves to the convex QP relaxations.

Lemma 2

Denote $\lambda ^{*}$ as the optimal $\lambda$ defined in Theorem 2, $\hat{\lambda }$ as $\lambda$ we use for verification, $\bar{\lambda }$ as defined in (9), $c(x, \lambda )$ as the propagation gap defined in (5) and $\hat{x}_{qp}$ as the solution of $\text {QPRel}(\hat{\lambda })$. Then we get the following upper bound on the possible improvement of QPRel’s objective function for a $\lambda$ value that is different from our $\hat{\lambda }$:

$$\max _{\begin{array}{c} \lambda \ge 0 \\ M^{\lambda} \text { psd} \end{array}} \left( \text {QPRel}(\lambda ) - \text {QPRel}(\hat{\lambda })\right) =\text {QPRel}(\lambda ^{*}) - \text {QPRel}(\hat{\lambda }) \le c(\hat{x}_{qp}, \bar{\lambda } - \hat{\lambda }).$$

In summary, we have the following relation between the values defined above, where we add -P and -D to the problem name to denote its primal and dual forms respectively:

$$\text {DtDB-P} \ge \text {DtDB-D} = \text {QPRel} (\lambda ^{*}) \ge \text {QPRel}(\hat{\lambda }).$$

We have shown how to find a good $\hat{\lambda }$ and are able to estimate the gap resulting in the second $\ge$ sign as shown in Lemma 2. Additionally, in the next section we describe how to close the duality gap resulting in the first $\ge$ sign by introducing additional constraints to the QPRel problem.

3.3 Improving bounds via additional linear constraints

The initial DtDB problem and its relaxation QPRel do not require bounds on pre-activation values $W^{l}x^{l-1} + b^{l}$ frequently used in $\epsilon$-verification approaches. However, if available, these can improve our relaxation. That is, we can additionally bound the admissible set of QPRel by

$$\underline{a}^{l} \le W^{l}x^{l-1} + b^{l} \le \bar{a}^{l} \quad \text { for }l=1,\ldots ,L-1$$

(10)

given some bounds $\underline{a}^{l}, \bar{a}^{l} \in {\mathbb {R}}^{n_{l}}$ for layer l. Moreover, we include the following linear constraint on each neuron i in layer l as also widely used in other verification methods for ReLU networks (Ehlers 2017; Wong and Kolter 2018; Dvijotham et al. 2019; Salman et al. 2019).

$$-\bar{a}^{l}_{i} (W^{l}x^{l-1} + b^{l})_{i} + (\bar{a}^{l}_{i} -\underline{a}^{l}_{i}) x^{l}_{i} \le -\bar{a}^{l}_{i} \underline{a}^{l}_{i}.$$

(11)

Note that constraints (10) and (11) are linear and therefore the new relaxation is still a QP.

Before continuing the discussion how we exploit these bounds, we first introduce the notation of a proper bound propagation mapping. We need this to ensure that the resulting solution of QPRel with these additional constraints is still a lower bound on DtDB. For a fixed anchor point and network weights consider a mapping from a bound in the input layer $\gamma \in {\mathbb {R}}_{+}$ to the bounds $\underline{a}^{l}(\gamma ), \bar{a}^{l}(\gamma )\in {\mathbb {R}}^{n_{l}}$. We call this mapping a proper bound propagation mapping if

1.
bounds are valid for all $x^{0}$ with $\Vert \tilde{x}^{0} - x^{0}\Vert \le \gamma$ inequalities (10) hold for the corresponding pre-activation values in each layer as defined in (1) and
2.
bounds are monotone for arbitrary $\gamma _{1}\le \gamma _{2}$ in each hidden layer l of the network there holds $\bar{a}^{l}(\gamma _{2})\ge \bar{a}^{l}(\gamma _{1})\ge \underline{a}^{l} (\gamma _{1})\ge \underline{a}^{l}(\gamma _{2})$.

In our experiments we deploy the bound propagation technique by Wong and Kolter (2018) to obtain bounds $\underline{a}^{l}, \bar{a}^{l}$ since it satisfies these properties and is computationally efficient.

Lemma 3

When using a proper bound propagation mapping, the following holds for the square root of the optimal objective function value $d_{\text{qp}}(\gamma )$ of QPRel (we drop the dependence on $\lambda$ since it is now fixed) solved with the additional constraints (10) and (11) using pre-activation bounds $\underline{a}^{l}(\gamma ), \bar{a}^{l}(\gamma )$.

1.
$d_{\text{qp}}(\gamma _{1}) \ge d_{\text{qp}}(\gamma _{2})$ if $\gamma _{1} \le \gamma _{2},$ i.e. $d_{\text{qp}}(\gamma )$ is monotonically decreasing, where we say that $d_{\text{qp}}(\gamma )=\infty$ if the corresponding QPRel with (10) and (11) is infeasible,
2.
if $d_{\text{qp}}(\gamma ) \le \gamma$ then $d_{\text{qp}}(\gamma )$ is a lower bound on DtDB (which might not be the case otherwise, see “Appendix B” for details).

Guided by the results of Lemma 3 we apply binary search to find the smallest $\gamma$ that is still providing us with a lower bound $d_{\text{qp}}(\gamma )$ on the smallest adversarial perturbation (the smaller the value of $\gamma$, the better the resulting bound). In each step we solve a convex QP and increase $\gamma$ if QPRel is infeasible, that is current bounds $\underline{a}^{l}(\gamma ), \bar{a}^{l}(\gamma )$ are too tight, or if $d_{\text{qp}}(\gamma ) > \gamma$ since in this case we do not have a certificate for $d_{\text{qp}}(\gamma )$ to be a valid lower bound on DtDB. Otherwise we set the current $\gamma$ as the right boundary of the search interval and proceed with a smaller value of $\gamma$. The whole procedure is summarized in Algorithm 1.

3.4 $l_{\infty}$-Setting

For comparison with the SDP-based approach by Raghunathan et al. (2018b) we show how we apply our method to compute bounds on the distance to the closest adversarial measured using the $l_{\infty}$-norm. A straight forward way would be to modify the objective function accordingly. By introducing a new variable m representing $\Vert x^{0} - \tilde{x}^{0}\Vert _{\infty} ^{2} = \max _{i} (x_{i}^{0} - \tilde{x}_{i}^{0})^{2}$ and $n_{0}$ new quadratic constraints we get the following versions of QPRel:

$$\begin{aligned}\min _{x\in {\mathbb {R}}^{n}, m\in {\mathbb {R}}} m + c(x, \lambda ) \text {, s.t. }& ( x_{i}^{0} - \tilde{x}_{i}^{0} )^{2} \le m, \quad i=1,\ldots ,n_{0}, \\&(e_{\tilde{y}} - e_{y})^{T}\left( W^{L}x^{L-1} + b^{L} \right) \le 0 ,\\&x^{l} - \left( W^{l}x^{l-1} + b^{l}\right) \ge 0,\quad x^{l} \ge 0\quad \text { for }l\in [L-1]. \end{aligned}$$

Note that the quadratic constraints do not harm the complexity since they describe a convex cone and can be handled by the QP-solvers. While this formulation is of a similar structure as the QPRel (quadratic objective as well as linear and quadratic constraints), the Hessian of the objective function is not positive semi-definite for any value of $\lambda$. Since $c(x, \lambda )$ is the only source of quadratic terms now (squared distance to the anchor point is now replaced by m), the new $M^{\lambda}$ is of the same form as in (6), but with $\lambda _{0}=0$. To see that we cannot affect the convexity of the objective function by the parameter $\lambda$ anymore consider vector x with an arbitrary $x^{0}\in {\mathbb {R}}^{n_{0}}$ as well as $x^{1} = \alpha W^{1}x^{0}$ for some $0<\alpha <1$ and $x^{l}=0$ for $l>1$. Then

$$x^{T}M^{\lambda} x =\lambda _{1} \left( \Vert x^{1}\Vert ^{2} -\left( x^{1}\right) ^{T}W^{1}x^{0} \right) =\lambda _{1}(\alpha ^{2} - \alpha )\Vert W^{1}x^{0}\Vert ^{2}<0$$

meaning that $M^{\lambda}$ cannot be positive semi-definite.

To overcome this issue, we utilize the new quadratic constraints. We return back to a convex QP by considering the following problem with a positive $\mu$.

$$\begin{aligned}\min _{x\in {\mathbb {R}}^{n}, m\in {\mathbb {R}}} m + c(x, \lambda ) +\mu \sum _{i=1}^{n_{0}}\left( (x_{i}^{0} - \tilde{x}_{i}^{0})^{2} - m \right) , \quad \text {s.t. } & ( x_{i}^{0} - \tilde{x}_{i}^{0} )^{2} \le m \quad \text {for }i=1,\ldots ,n_{0},\\&(e_{\tilde{y}} - e_{y})^{T}\left( W^{L}x^{L-1} + b^{L} \right) \le 0,\\&x^{l} - \left( W^{l}x^{l-1} + b^{l}\right) \ge 0, x^{l} \ge 0 \quad \text {for }l\in [L-1]. \end{aligned}$$

Clearly, for $0<\mu \le n_{0}^{-1}$ the solution of this problem is a finite lower bound on DtDB with the $l_{\infty}$-norm. On the other side we are back in the setting of Theorem 1 with $\lambda _{0} = \mu$ allowing us to use the same framework as before. In Sect. 4 we obtain the results in the $l_{\infty}$-setting by solving this problem with $\mu =(2n_{0})^{-1}$.

4 Experiments

For each considered sample we apply the procedure described in Sect. 3.3, Algorithm 1 including tightening of the relaxation by introducing additional linear constraints (10) and (11). $\hat{\lambda }$ is chosen for each classifier according to Theorem 1 and the discussion afterwards such that a relative accuracy of at least $c_\lambda = 10^{-4}$ is achieved during the binary search in each $\lambda _{l}$. For the values of other parameters in Algorithm 1 we choose for all tests $c_\gamma = 10^{-8}$ and $n_\gamma =10$. Other methods are tested with the default settings as provided in the corresponding repositories. For ConvAdv by Wong and Kolter (2018) we use the maximum of 200 iterations during Newton’s method for the networks D8, D8R, C, CR (see below) and 20 otherwise. To solve the QP tasks or verify that they are infeasible we use Gurobi (Gurobi Optimization 2018).

Datasets and classifiers The experiments are performed using the MNIST (LeCun et al. 1999) and Fashion-MNIST (Xiao et al. 2017) datasets as well as the tabular datasets IRIS (3 classes, 4 features) and WINE (2 classes, 12 features) from Dua and Graff (2017) scaled such that the feature values lie in [0, 1] interval. For each of the datasets we use the correctly classified samples from 120 train points to evaluate the verification approaches.

For classification we take ReLU networks consisting of dense and convolutional linear layers. The architectures we used for the image datasets are named D2, D4, D8 (dense networks containing 2, 4 and 8 hidden layers consisting of 50 neurons each with an exception for the last 4 layers in D8 that have 20 neurons each) and C. We use similar structures of the networks as Wong and Kolter (2018) to enable easier comparison. The latter consists of two convolutional layers with $4\times 4$ windows, a stride of 2 as well as 16 and 32 output channels correspondingly, followed by two dense layers with input/output dimensions of 1568/100 and finally 100/10. For each architecture we use normally trained classifiers as well as robustly trained ones (indicated by suffix R, e.g. CR) using the method by Wong and Kolter (2018) with $\epsilon =1.58$ in $l_{2}$-setting and $\epsilon =0.1$ in $l_{\infty}$-setting. For the tabular datasets we use a dense network with two hidden layers with 10 neurons called D2 and different $\epsilon$ values in $l_{2}$-setting: 0.113 for IRIS and 0.195 for WINE (and the same $\epsilon =0.1$ in $l_{\infty}$-setting). The weights as well as the project code are available at github.com/Aleksei-Kuvshinov/QPRel. In Table 1 we show the clean accuracy of the trained networks on the corresponding test sets.

Table 1 Clean accuracy

Full size table

Competitors We compare our approach QPRel with the following verification methods: ConvAdv by Wong and Kolter (2018) based on the LP relaxation of ReLU constraints (we use its implementation supporting the $l_{2}$-norm by Croce et al. 2019), CROWN by Zhang et al. (2018) which is a layerwise bound propagation technique including performance boosting quadratic approximations and warm start (for dense networks only since its implementation did not support convolutional layers), and SDPRel by Raghunathan et al. (2018b) based on a SDP relaxation solved by MOSEK.

Metrics The results on MNIST and Fashion-MNIST for the $l_{2}$- and $l_{\infty}$ setting are shown in Tables 3 and 4 correspondingly. We show the results on the tabular data in Table 2. We run the methods for each of the considered samples and report the following metrics.

Table 2 Experiment results, tabular data

Full size table

(1) AvgBound the average value of the bounds obtained from QPRel and the corresponding competitor (the best value marked bold if at least 5% larger than the worst one). To asses the impact of introducing additional linear constraints using a bound propagation method as described in Sect. 3.3 we report the lower bounds obtained by solving QPRel without constraints (10) and (11) in the last column AvgBound (no BndProp) in Tables 3 and 4. (2) MedRelDiff to QPRel: the median of the relative difference between the bounds (e.g. QPRel minus CROWN and then divided by CROWN). Positive values for the lower bounds mean our bounds are better in average over the samples. (3) $\epsilon$ to hit 50% LB-verified: the number of samples with an adversarial-free radius of $\epsilon$ is monotonically decreasing in $\epsilon$. Therefore, to assess the performance of a verification procedure like QPRel or CROWN we report the smallest $\epsilon$ such that exactly 50% of the samples are successfully verified. The larger this value, the better (the largest values marked bold).

Table 3 Experiment results, image data, $l_{2}$-setting

Full size table

Table 4 Experiment results, image data, $l_{\infty}$-setting

Full size table

$l_{2}$-setting, state-of-the-art bounds For all considered architectures the lower bounds computed by QPRel are tighter in comparison to the competitors in average (see Table 3, AvgBound and MedRelDiff) and for the networks with a smaller number of hidden layers even for most individual images. Naturally, this results in larger values of $\epsilon$ to hit 50% LB-verified as well. It seems that the competitors tend to underestimate robustness of the considered networks, especially if it was not trained robustly. For the normally trained convolutional network C on MNIST we were able to improve the competitor’s lower bounds by a factor of 2 in average. In contrast to other verification procedure that can not easily verify networks that were not robustly trained, our method is applicable to normally trained networks as well.

While this improvement of the verifiable radius comes at higher computational cost (QPRel is about one order of magnitude slower than the LP-competitors) due to a fundamental difference in complexity of the LP- and QP-tasks, the average runtime per sample is still only seconds or less for the dense and multiple minutes for the convolutional networks. We present a detailed runtime comparison in “Appendix A”.

In the last column of Table 3, we report the lower bound obtained when solving QPRelwithout introducing additional constraints as described in Sect. 3.3. We observe that the relaxation becomes less tight for networks with more layers and if it was trained robustly. We suppose that when the number of layers L becomes larger the binary search between $\underline{\lambda }$ and $\bar{\lambda }$ (see Theorem 1 and the discussion afterwards) in a higher dimensional space results in a point far from the optimal Lagrange multipliers. Especially the last $\underline{\lambda }_{L-1}$ and $\bar{\lambda }_{L-1}$ defined in (9) become small such that the gap between $x^{L-1}$ and $W^{L-1}x^{L-2}+b^{L-1}$ has only a very limited effect on the objective function of QPRel. That results in an undesired optimal solution of QPRel with a large propagation gap. At that point, by introducing additional linear constraints [especially (11)] we prohibit this behavior by bounding the propagation gap for the set of feasible points. Overall, incorporating additional linear constraints by using bounds on ReLU’s input has proven to significantly improve our relaxation and the resulting lower bounds.

$l_{\infty}$-setting, comparison with SDP-relaxations In order to compare our method with the work done by Raghunathan et al. (2018b) we generalize QPRel to the $l_{\infty}$-setting as described in Sect. 3.4. Note, that the resulting relaxation is looser than the initial QPRel for the $l_{2}$-setting since we bound the $l_{\infty}$-distance from below to make the problem quadratic and convex. To compute the largest $\epsilon$ such that the SDP verification succeeds we perform a binary search on the [0, 1] interval. Since this approach takes longer to run we test it only on the networks D2 and D2R trained with $\epsilon =0.1$ (MNIST data).

In $l_{\infty}$ setting our bounds are about 3 times smaller than the ones of SDPRel (see Table 4, MedRelDiff to QPRel)—though computed three orders of magnitude faster (see “Appendix A”). This shows that the QP relaxation is less suited than the competitors for obtaining tight bounds in $l_{\infty}$-setting as already indicated by the arguments above due to the nature of the quadratic relaxation, but trades this off by much better efficiency compared to SDPRel.

5 Conclusion and future work

We presented a novel approach to solve the problem of approximating the minimal adversarial perturbations for ReLU networks based on a convex QP relaxation of DtDB. We show that the lower bounds computed with QPRel allow certification of larger neighborhoods. Since convexity of the underlying QP determines computational efficiency of our approach we derive the necessary and sufficient conditions on the Lagrangian multipliers. The obtained lower bounds in the $l_{2}$-setting show state-of-the-art results allowing to certify larger radia around the data samples as adversarial free.

With our contribution we make a step towards robustness verification of deep ReLU-based classifiers. While the proposed theoretical framework is applicable to any linear transformations including dense, convolutional and average pooling layers as well as skip connections, it requires a different analysis when a non-ReLU activation functions are used (except leaky ReLU). To be able to apply the approach on a wider class of networks it should be generalized to popular architectures beyond ReLU activations. Last but not least, excellent results that our method demonstrated for the verification task indicate an intriguing research direction toward robust training. Based on our certificates the next step towards robust training would be an approach that uses the solution of QPRel to make an update step resulting in larger certified neighborhood for the correctly classified samples. As our approach does not require a predefined $\epsilon$, that additional regularization acts individually for each sample depending on its current robust neighborhood.

Data availability

The authors provide references to all data and material used in this work.

Code availability

Custom code is provided including the installation instructions. It requires installation of the gurobi solver, academic licenses are available at gurobi.com.

Notes

https://docs.mosek.com/9.0/pythonapi/parameters.html contains the full list of parameters including their description.

References

Bunel, R. R., Turkaslan, I., Torr, P., Kohli, P., & Mudigonda, P. K. (2018). A unified view of piecewise linear neural network verification. Advances in Neural Information Processing Systems, 31, 4790–4799.
Google Scholar
Carlini, N., & Wagner, D. (2017). Towards evaluating the robustness of neural networks. In 2017 IEEE symposium on security and privacy (SP) (pp 39–57).
Croce, F., Andriushchenko, M., & Hein, M. (2019). Provable robustness of ReLU networks via maximization of linear regions. In AISTATS.
Dua, D., & Graff, C. (2017). UCI machine learning repository. http://archive.ics.uci.edu/ml.
Dutta, S., Jha, S., Sankaranarayanan, S., & Tiwari, A. (2018). Output range analysis for deep feedforward neural networks. In NASA formal methods (pp. 121–138). https://doi.org/10.1007/978-3-319-77935-5_9.
Dvijotham, K., Stanforth, R., Gowal, S., Mann, T. A., & Kohli, P. (2018). A dual approach to scalable verification of deep networks. In Proceedings of the conference on uncertainty in artificial intelligence. http://auai.org/uai2018/proceedings/papers/204.pdf.
Dvijotham, K., Stanforth, R., Gowal, S., Qin, C., De, S., & Kohli, P. (2019). Efficient neural network verification with exactness characterization. In Proceedings of the conference on uncertainty in artificial intelligence. http://auai.org/uai2019/proceedings/papers/164.pdf.
Ehlers, R. (2017). Formal verification of piece-wise linear feed-forward neural networks. In Automated technology for verification and analysis (pp. 269–286). Springer
Goodfellow, I., Shlens, J., & Szegedy, C. (2015). Explaining and harnessing adversarial examples. In International conference on learning representations. arXiv:1412.6572.
Gurobi Optimization L. (2018). Gurobi optimizer reference manual. http://www.gurobi.com.
Hein, M., & Andriushchenko, M. (2017). Formal guarantees on the robustness of a classifier against adversarial manipulation. Advances in Neural Information Processing Systems, 30, 2266–2276.
Google Scholar
Jordan, M., Lewis, J., & Dimakis, A. G. (2018). Provable certificates for adversarial examples: Fitting a ball in the union of polytopes. In Advances in neural information processing systems 33.
Katz, G., Barrett, C., Dill, D. L., Julian, K., & Kochenderfer, M. J. (2017). Reluplex: An efficient SMT solver for verifying deep neural networks. In Computer aided verification (pp. 97–117).
Kurakin, A., Goodfellow, I. J., & Bengio, S. (2016). Adversarial examples in the physical world. In International conference on learning representations. arXiv:1607.02533.
LeCun, Y., Cortes, C., & Burges, C. J. (1999). The MNIST database of handwritten digits. http://yann.lecun.com/exdb/mnist/.
Lomuscio, A., & Maganti, L. (2017). An approach to reachability analysis for feed-forward ReLU neural networks. arXiv e-prints arXiv:1706.07351.
Madry, A., Makelov, A., Schmidt, L., Tsipras, D., & Vladu, A. (2017). Towards deep learning models resistant to adversarial attacks. arXiv preprint arXiv:170606083.
Papernot, N., McDaniel, P., Jha, S., Fredrikson, M., Celik, Z. B., & Swami, A. (2016). The limitations of deep learning in adversarial settings. In 2016 IEEE European symposium on security and privacy (EuroS&P) (pp. 372–387).
Raghunathan, A., Steinhardt, J., & Liang, P. (2018a). Certified defenses against adversarial examples. In International conference on learning representations.
Raghunathan, A., Steinhardt, J., & Liang, P. S. (2018b). Semidefinite relaxations for certifying robustness to adversarial examples. Advances in Neural Information Processing Systems, 31, 10877–10887.
Google Scholar
Salman, H., Yang, G., Zhang, H., Hsieh, C. J., & Zhang, P. (2019). A convex relaxation barrier to tight robustness verification of neural networks. Advances in Neural Information Processing Systems, 32, 9832–9842.
Google Scholar
Szegedy, C., Zaremba, W., Sutskever, I., Bruna, J., Erhan, D., Goodfellow, I., & Fergus, R. (2014). Intriguing properties of neural networks. In International conference on learning representations. arXiv:1312.6199.
Tjeng, V., Xiao, K., & Tedrake, R. (2017). Evaluating robustness of neural networks with mixed integer programming. arXiv preprint arXiv:171107356.
Tsuzuku, Y., Sato, I., & Sugiyama, M. (2018). Lipschitz-margin training: Scalable certification of perturbation invariance for deep neural networks. Advances in Neural Information Processing Systems, 31, 6541–6550.
Google Scholar
Wang, S., Chen, Y., Abdou, A., & Jana, S. (2018). MixTrain: Scalable training of verifiably robust neural networks. arXiv preprint arXiv:181102625.
Weng, L., Zhang, H., Chen, H., Song, Z., Hsieh, C. J., Daniel, L., Boning, D., & Dhillon, I. (2018). Towards fast computation of certified robustness for ReLU networks. In Proceedings of the 35th international conference on machine learning (Vol. 80, pp. 5276–5285).
Wong, E., & Kolter, Z. (2018). Provable defenses against adversarial examples via the convex outer adversarial polytope. In Proceedings of the 35th international conference on machine learning (Vol. 80, pp. 5286–5295).
Xiao, H., Rasul, K., & Vollgraf, R. (2017). Fashion-MNIST: A novel image dataset for benchmarking machine learning algorithms. CoRR arXiv:1708.07747.
Zhang, H., Weng, T. W., Chen, P. Y., Hsieh, C. J., & Daniel, L. (2018). Efficient neural network robustness certification with general activation functions. Advances in Neural Information Processing Systems, 31, 4939–4948.
Google Scholar

Download references

Funding

Open Access funding enabled and organized by Projekt DEAL. This research was supported by the BMW AG.

Author information

Authors and Affiliations

Data Analytics and Machine Learning Group, Department of Informatics, Technical University of Munich, Munich, Germany
Aleksei Kuvshinov & Stephan Günnemann

Authors

Aleksei Kuvshinov
View author publications
You can also search for this author in PubMed Google Scholar
Stephan Günnemann
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

SG, AK: Conceptualization; SG, AK: Methodology; SG, AK: Formal analysis and investigation; AK: Writing—original draft preparation; SG, AK: Writing—review and editing; SG: Supervision.

Corresponding author

Correspondence to Aleksei Kuvshinov.

Ethics declarations

Conflict of interest

Not applicable, the authors have no conflicts of interest to declare that are relevant to the content of this article.

Ethical approval

The authors approve that the research presented in this paper is conducted following the principles of ethical and professional conduct.

Informed consent

The authors consent to participate in ECML PKDD 2022 conference.

Consent for publication

Not applicable, the authors use publicly available data only and provide the corresponding references.

Additional information

Editors: Annalisa Appice, Sergio Escalera, Jose A. Gamez, Heike Trautmann.

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Appendices

Appendix

A Runtime

Tables 5, 6, 7 and 8 (see “Appendix C”) show the average runtime and its standard deviation for the considered experiments. During the binary search procedure we apply for SDPRel we always make 10 bisection steps. Furthermore, we speed up this approach by modifying MOSEK parameters^{Footnote 1} (see Table 9 for details) such that the optimization procedure terminates earlier (approximately after a half of the usual number of iterations). We can still rely on the obtained results since we are not interested in the exact value of the SDP objective, but only whether it is positive or negative which was observed to be determined far sooner during the solution process than when the solver would reach a true optimum.

All tasks necessary for the computation of bounds on DtDB for one sample are run on four CPUs (including the solution of QPs and SDPs with Gurobi and MOSEK respectively). Column Runtime-LB (s), sequential QPRel shows the runtime of the whole bound improvement procedure as described in Sect. 3.3. From the comparison of QPRel and the LP-based approaches we see the clear advantage of the latter since they do not involve any optimization task. However, especially in the $l_{2}$-setting this advantage comes in cost of the verification properties as discussed in Sect. 4. On the other hand, SDPRel with a binary search provides better bounds, but is about three orders of magnitude slower than QPRel.

B Proofs

Lemma 1

Denote the solution of QPRel by $x_{\text{qp}}$ and the square root of its optimal objective value by $d_{\text{qp}}$, let d be the square root of the optimal objective value of DtDB. The following holds:

1.
$d_{\text{qp}} \le d$ and when $c( x_{\text{qp}}, \lambda ) = 0$ we have $d_{\text{qp}} = d$ and $x_{\text{qp}}$ is optimal for DtDB.
2.
For two non-negative $\lambda ^{1}, \lambda ^{2}$ with $\lambda ^{1} \le \lambda ^{2}$ elementwise it holds that $d_{\text{qp}} (\lambda ^{1}) \le d_{\text{qp}}(\lambda ^{2})$.

Proof

Assume $x_{\text{adv}}$ is the optimal solution of DtDB. Then it is an admissible point of QPRel as well and $c( x_{\text{adv}}, \lambda )=0$ since $x_{\text{adv}}^{l} = \text {ReLU}(W^{l} x_{\text{adv}}^{l-1} + b^{l})$ for $l=1,\ldots ,L-1$. Since $x_{\text{qp}}$ is optimal for QPRel and $x_{\text{adv}}$ is an admissible point we get

$$\begin{aligned} d^{2}&=\Vert x_{\text{adv}}^{0} - \tilde{x}^{0}\Vert ^{2} =\Vert x_{\text{adv}}^{0} - \tilde{x}^{0}\Vert ^{2} + c( x_{\text{adv}}, \lambda ) \ge \Vert x_{\text{qp}}^{0} - \tilde{x}^{0}\Vert ^{2} + c( x_{\text{qp}}, \lambda ) =d_{\text{qp}}^{2} \end{aligned}$$

proving the first claim. The second one follows from the fact that $c(x, \lambda )$ for a given x is a linear function of $\lambda$:

$$c(x, \lambda ) = \lambda ^{T} \begin{pmatrix} c(x, e_{1}) \\ \vdots \\ c(x, e_{L-1}), \end{pmatrix}$$

where each $c(x, e_{l}) = \left( x^{l} \right) ^{T}\left( x^{l} -\left( W^{l}x^{l-1} + b^{l}\right) \right)$ is non-negative for admissible x because of the non-negativity constraints (4). Therefore the claim follows immediately from the assumption that $\lambda _{{l}}^{1}\le \lambda _{{l}}^{2}$ for all l

$$c(x, \lambda ^{1}) = \sum _{l=1}^{L-1} \lambda _{{l}}^{1} c(x, e_{l}) \le \sum _{l=1}^{L-1} \lambda _{{l}}^{2} c(x, e_{l}) = c(x, \lambda ^{2}).$$

$\square$

Theorem 3

Let $W^{1},\ldots ,W^{L-1}$ be the weights of a pre-trained neural network and $\Vert W\Vert$ the spectral norm of an arbitrary matrix. Then the following two conditions for $\lambda$ provide correspondingly a sufficient and a necessary criterion for the matrix $M^{\lambda}$ to be positive semi-definite.

$$\begin{aligned}&\text {(suf. condition)} \quad \lambda _{{1}}\le \frac{2\lambda _{{0}}}{\Vert W^{1}\Vert ^{2}}\quad \text { and }\quad \lambda _{{l}}\le \frac{\lambda _{{l-1}}}{\Vert W^{l}\Vert ^{2}} \quad \text { for }l\ge 2, \end{aligned}$$

(7)

$$\begin{aligned}&\text {(nec. condition)} \quad \lambda _{{l}}\le \frac{4\lambda _{{l-1}}}{\Vert W^{l}\Vert ^{2}} \quad \text { for }l\ge 1. \end{aligned}$$

(8)

Furthermore, we define $\underline{\lambda }$ and $\bar{\lambda }$ that correspondingly satisfy conditions (7) and (8) with equality:

$$\underline{\lambda }_{l} = 2 \prod _{k=1}^{l} \frac{1}{\Vert W^{k}\Vert ^{2}}, \quad \bar{\lambda }_{l} = 4^{l}\prod _{k=1}^{l} \frac{1}{\Vert W^{k}\Vert ^{2}}.$$

(9)

Finally, in case with a single hidden layer $M^{\lambda }$ is positive-semi definite even for $\lambda =\bar{\lambda }$ from (8).

Proof

Let the assumptions hold and x be an arbitrary vector from ${\mathbb {R}}^{n}$. First we prove the sufficient condition by deriving a lower bound on $x^{T}M^{\lambda} x$ that is non-negative if (7) holds.

$$\begin{aligned} x^{T}M^{\lambda} x&= \sum _{l=0}^{L-1}\lambda _{l}\Vert x^{l}\Vert ^{2} -\sum _{l=1}^{L-1}\lambda _{l}\left( x^{l}\right) ^{T}W^{l}x^{l-1} \\&= \frac{\lambda _{0}}{2}\Vert x^{0}\Vert ^{2} + \frac{\lambda _{L-1}}{2}\Vert x^{L-1}\Vert ^{2} + \sum _{l=1}^{L-1} \left( \frac{\lambda _{l}}{2}\Vert x^{l}\Vert ^{2} - \lambda _{l} \left( x^{l}\right) ^{T}W^{l}x^{l-1} + \frac{\lambda _{l-1}}{2}\Vert x^{l-1} \Vert ^{2}\right) \\&= \frac{\lambda _{0}}{2}\Vert x^{0}\Vert ^{2} + \frac{\lambda _{L-1}}{2}\Vert x^{L-1}\Vert ^{2} +\sum _{l=1}^{L-1} \left( \frac{\lambda _{l}}{2}\Vert x^{l}\Vert ^{2} - \lambda _{l} \left( x^{l}\right) ^{T}W^{l}x^{l-1} + \frac{\lambda _{l}}{2} \Vert W^{l}x^{l-1}\Vert ^{2}\right) \\&\quad +\sum _{l=1}^{L-1} \left( \frac{\lambda _{l-1}}{2} \Vert x^{l-1}\Vert ^{2} - \frac{\lambda _{l}}{2}\Vert W^{l}x^{l-1}\Vert ^{2}\right) \\&= \frac{\lambda _{0}}{2}\Vert x^{0}\Vert ^{2} + \frac{\lambda _{L-1}}{2}\Vert x^{L-1}\Vert ^{2} +\sum _{l=1}^{L-1}\frac{\lambda _{l}}{2}\Vert x^{l} - W^{l}x^{l-1}\Vert ^{2}\\&\quad + \sum _{l=1}^{L-1} \left( \frac{\lambda _{l-1}}{2}\Vert x^{l-1}\Vert ^{2} - \frac{\lambda _{l}}{2}\Vert W^{l}x^{l-1}\Vert ^{2} \right) \\&= \frac{\lambda _{L-1}}{2}\Vert x^{L-1}\Vert ^{2} +\sum _{l=1}^{L-1} \frac{\lambda _{l}}{2}\Vert x^{l} - W^{l}x^{l-1}\Vert ^{2} +\left( \lambda _{0}\Vert x^{0}\Vert ^{2} - \frac{\lambda _{1}}{2}\Vert W^{1}x^{0}\Vert ^{2} \right) \\&\quad +\sum _{l=2}^{L-1} \left( \frac{\lambda _{l-1}}{2}\Vert x^{l-1}\Vert ^{2} -\frac{\lambda _{l}}{2}\Vert W^{l}x^{l-1}\Vert ^{2} \right) \\&\ge \frac{\lambda _{L-1}}{2}\Vert x^{L-1}\Vert ^{2} +\sum _{l=1}^{L-1} \frac{\lambda _{l}}{2}\Vert x^{l} - W^{l}x^{l-1}\Vert ^{2} +\frac{1}{2}\left( 2\lambda _{0} - \lambda _{1}\Vert W^{1}\Vert ^{2}\right) \Vert x^{0}\Vert ^{2} \\&\quad + \frac{1}{2}\sum _{l=2}^{L-1}\left( \lambda _{l-1} - \lambda _{l} \Vert W^{l}\Vert ^{2}\right) \Vert x^{l-1}\Vert ^{2}, \end{aligned}$$

where we applied the sub-multiplicativity property of the spectral norm, i.e. $\Vert W^{l}x^{l-1}\Vert \le \Vert W^{l}\Vert \Vert x^{l-1}\Vert$, to obtain the last inequality. We see that under the assumption (7) on $\lambda$ and W’s it holds that

$$\lambda _{{1}}\Vert W^{1}\Vert ^{2}\le 2\lambda _{{0}}\text { and }\lambda _{{l}}\Vert W^{l}\Vert ^{2}\le \lambda _{{l-1}}\quad \text { for }l=2,\ldots ,L-1$$

and the lower bound on $x^{T}M^{\lambda} x$ obtained above is a sum of non-negative terms meaning that $x^{T}M^{\lambda} x\ge 0$ for all $x\in {\mathbb {R}}^{n}$.

To prove the necessary condition consider for each $l=1,\ldots ,L-1$ a special vector $\tilde{x}$ (we don’t explicitly label it as dependent on l to avoid overloaded notation) which is everywhere zero except

$$\tilde{x}^{l-1} := \arg \max _{x\in {\mathbb {R}}^{n_{l-1}}} \frac{\Vert W^{l} x\Vert }{\Vert x\Vert }\quad \text { and }\quad \tilde{x}^{l} :=\frac{1}{2} W^{l}x^{l-1}.$$

For $M^{\lambda}$ in order to be positive semi-definite it has to satisfy

$$\begin{aligned} 0 \le \tilde{x}^{T}M^{\lambda} \tilde{x}&= \begin{pmatrix} \tilde{x}^{l-1} \\ \tilde{x}^{l} \end{pmatrix}^{T} \begin{pmatrix} \lambda _{{l-1}}E_{l-1} &{} -\frac{1}{2}\lambda _{{l}}\left( W^{l} \right) ^{T} \\ -\frac{1}{2}\lambda _{{l}} W^{l} &{} \lambda _{{l}}E_{l}\\ \end{pmatrix} \begin{pmatrix} \tilde{x}^{l-1} \\ \tilde{x}^{l} \end{pmatrix} \\&= \lambda _{{l-1}}\Vert \tilde{x}^{l-1}\Vert ^{2} - \lambda _{{l}} \left( \tilde{x}^{l}\right) ^{T}W^{l}\tilde{x}^{l-1} + \lambda _{{l}}\Vert \tilde{x}^{l}\Vert ^{2}\\&= \lambda _{{l-1}}\Vert \tilde{x}^{l-1}\Vert ^{2} - \frac{1}{4} \lambda _{{l}}\Vert W^{l} \tilde{x}^{l-1}\Vert ^{2} = \left( \lambda _{{l-1}} - \frac{1}{4}\lambda _{{l}}\Vert W^{l}\Vert ^{2}\right) \Vert \tilde{x}^{l-1}\Vert ^{2}, \end{aligned}$$

which results in the necessary condition (8) as stated above. It remains to prove sufficiency of (8) if the considered network contains one hidden layer. To do so we can reuse the last computation and obtain now for arbitrary $x\in {\mathbb {R}}^{n}$

$$\begin{aligned} x^{T}M^{\bar{\lambda }} x&=\lambda _{{0}}\Vert x^{0}\Vert ^{2} - \lambda _{{1}} \left( x^{1}\right) ^{T}W^{1}x^{0} + \lambda _{{1}}\Vert x^{1}\Vert ^{2} \\&= \lambda _{{0}}\Vert x^{0}\Vert ^{2} - \frac{1}{4}\lambda _{{1}}\Vert W^{1}x^{0}\Vert ^{2} +\lambda _{{1}}\left\| \frac{1}{2}W^{1}x^{0} - x^{1}\right\| ^{2} \\&\ge \left( \lambda _{{0}} - \frac{1}{4}\lambda _{{1}}\Vert W^{1}\Vert ^{2}\right) \Vert x^{0}\Vert ^{2} + \lambda _{{1}}\left\| \frac{1}{2}W^{1}x^{0} - x^{1}\right\| ^{2}. \end{aligned}$$

We see that the last term remains non-negative in case of $\lambda _{{1}} =\frac{4\lambda _{{0}}}{\Vert W^{1}\Vert ^{2}}$ for all x. $\square$

Theorem 4

Solving the Lagrange dual problem of the non-convex DtDB is equivalent to solving the problem

$$\max _{\lambda \in {\mathbb {R}}^{L-1}_{+}}\text {QPRel}(\lambda ) \text { s.t. }M^{\lambda} \text { is positive semi-definite.}$$

Proof

We start with the following formulation of DtDB, where we use $\le$ instead of the equality to formulate the complementarity constraints. This is possible because of the fact that both components of the product are non-negative due to the other constraints, that are rewritten using the matrix notation from Lemma 4.

$$\begin{aligned}\min _{x\in {\mathbb {R}}^{n}} \Vert x^{0} - \tilde{x}^{0}\Vert ^{2},\quad \text { s.t. } & \left( x^{l} \right) ^{T}\left( x^{l} - \left( W^{l}x^{l-1} + b^{l}\right) \right) \le 0\quad \text { for }l\in [L-1], \\&\bar{M} x - B_{2} \ge 0. \end{aligned}$$

To formulate the Lagrange dual we introduce the non-negative Lagrange multipliers $\lambda$ and $\mu$. We obtain the following formulation of the primal problem

$$\min _{x\in {\mathbb {R}}^{n}} \max _{\begin{array}{c} \lambda \ge 0 \\ \mu \ge 0 \end{array}} \Vert x^{0} - \tilde{x}^{0}\Vert ^{2} + c(x, \lambda ) + \mu ^{T}(B_{2} - \bar{M} x),$$

and by switching the order of the optimization tasks we arrive at the dual task (again, we use the notation from Lemma 4 to rewrite the objective).

$$\max _{\begin{array}{c} \lambda \ge 0 \\ \mu \ge 0 \end{array}} \min _{x\in {\mathbb {R}}^{n}} \frac{1}{2} x^{T}M^{\lambda} x + x^{T}(B_{1} - \bar{M}^{T}\mu ) +\Vert \tilde{x}^{0}\Vert ^{2} + \mu ^{T} B_{2}.$$

Note, that for $\lambda$ such that $M^{\lambda}$ is not positive semi-definite there exists x such that $x^{T} M^{\lambda} x < 0$. Therefore, the inner optimization task is unbounded in this case. That means we can introduce the desired constraint on $\lambda$ and solve the convex QP explicitly, obtaining the following equivalent formulation of the dual.

$$\begin{aligned}&\max _{\begin{array}{c} \lambda \ge 0 \\ \mu \ge 0 \end{array}} -\frac{1}{2} (B_{1} - \bar{M}^{T}\mu )^{T} \left( M^{\lambda} \right) ^{-1} (B_{1} - \bar{M}^{T}\mu ) + \Vert \tilde{x}^{0}\Vert ^{2} + \mu ^{T} B_{2}, \\&\quad \text {s.t.}\, M^{\lambda} \text { is positive semi-definite}. \end{aligned}$$

By splitting the maximization task in two we obtain

$$\max _{\begin{array}{c} \lambda \ge 0 \\ M^{\lambda} \text { psd} \end{array}} \max _{\mu \ge 0} -\frac{1}{2} (B_{1} - \bar{M}^{T}\mu )^{T} \left( M^{\lambda} \right) ^{-1} (B_{1} - \bar{M}^{T}\mu ) + \Vert \tilde{x}^{0}\Vert ^{2} + \mu ^{T} B_{2},$$

where the inner task is a convex QP. Therefore, it can be transformed to its dual without introducing the duality gap. Following the steps we have done backwards (now with a fixed $\lambda$) we obtain exactly the QPRel problem as the dual of the inner optimization problem. That concludes the proof since we arrive at the formulation from the claim. $\square$

Lemma 2

Denote $\lambda ^{*}$ as the optimal $\lambda$ defined in Theorem 2, $\hat{\lambda }$ as $\lambda$ we use for verification, $\bar{\lambda }$ as defined in (9), $c(x, \lambda )$ as the propagation gap defined in (5) and $\hat{x}_{qp}$ as the solution of $\text {QPRel}(\hat{\lambda })$. Then we get the following upper bound on the possible improvement of QPRel’s objective function for a $\lambda$ value that is different from our $\hat{\lambda }$:

$$\max _{\begin{array}{c} \lambda \ge 0 \\ M^{\lambda} \text { psd} \end{array}} \left( \text {QPRel}(\lambda ) - \text {QPRel}(\hat{\lambda })\right) =\text {QPRel}(\lambda ^{*}) - \text {QPRel}(\hat{\lambda }) \le c(\hat{x}_{qp}, \bar{\lambda } - \hat{\lambda }).$$

Proof

The first equality holds due to the definition of $\lambda ^{*}$. From there we proceed as follows.

$$\begin{aligned} \text {QPRel}(\lambda ^{*}) - \text {QPRel}(\hat{\lambda })&\le \text {QPRel}(\bar{\lambda }) - \text {QPRel}(\hat{\lambda }) \\&= \left( \min _{x} \Vert x^{0} - \tilde{x}^{0}\Vert + c(x,\bar{\lambda })\right) - \left( \Vert \hat{x}^{0} - \tilde{x}^{0}\Vert + c(\hat{x},\hat{\lambda }) \right) \\&\le \Vert \hat{x}^{0} - \tilde{x}^{0}\Vert + c(\hat{x},\bar{\lambda }) -\left( \Vert \hat{x}^{0} - \tilde{x}^{0}\Vert + c(\hat{x},\hat{\lambda }) \right) =c(\hat{x},\bar{\lambda } - \hat{\lambda }). \end{aligned}$$

To prove the first inequality note that $M^{\lambda ^{*}}$ is positive semi-definite and therefore $\lambda ^{*}$ satisfies the necessary condition (8) from Theorem 1 meaning that $\lambda ^{*} \le \bar{\lambda }$ elementwise. Due to the fact that QPRel is monotone with respect to $\lambda$ (see Lemma 1) we get that $\text {QPRel} (\lambda ^{*}) \le \text {QPRel}(\bar{\lambda })$. For the second inequality instead of the minimum objective function value we just use the value of $\text {QPRel}(\bar{\lambda })$ evaluated at $\hat{x}$. Finally, the last equation holds due to the linearity of $c(x,\lambda )$ with respect to $\lambda$. $\square$

Lemma 3

When using a proper bound propagation mapping, the following holds for the square root of the optimal objective function value $d_{\text{qp}}(\gamma )$ of QPRel (we drop the dependence on $\lambda$ since it is now fixed) solved with the additional constraints (10) and (11) using pre-activation bounds $\underline{a}^{l}(\gamma ), \bar{a}^{l}(\gamma )$.

1.
$d_{\text{qp}}(\gamma _{1}) \ge d_{\text{qp}}(\gamma _{2})$ if $\gamma _{1} \le \gamma _{2}$, i.e. $d_{\text{qp}}(\gamma )$ is monotonically decreasing, where we say that $d_{\text{qp}} (\gamma )=\infty$ if the corresponding QPRel with (10) and (11) is infeasible,
2.
if $d_{\text{qp}}(\gamma ) \le \gamma$ then $d_{\text{qp}} (\gamma )$ is a lower bound on DtDB.

Proof

First claim follows directly from the fact that QPRel with constraints (10) and (11) has a larger feasible set if the additional constraints are constructed using a larger value for $\gamma$. Assume that $\gamma _{1}\le \gamma _{2}$ and constraints (10), (11) hold for bounds $\underline{a}^{l}(\gamma _{1})$ and $\bar{a}^{l}(\gamma _{1})$, then they hold for $\underline{a}^{l}(\gamma _{2})$ and $\bar{a}^{l}(\gamma _{2})$ automatically.

For (10), if $\underline{a}^{l}(\gamma _{1}) \le W^{l}x^{l-1} + b^{l} \le \bar{a}^{l}(\gamma _{1})$ then $\underline{a}^{l}(\gamma _{2}) \le W^{l}x^{l-1} + b^{l} \le \bar{a}^{l}(\gamma _{2})$ since $\underline{a}^{l}(\gamma _{2}) \le \underline{a}^{l}(\gamma _{1})$ and $\bar{a}^{l}(\gamma _{1}) \le \bar{a}^{l}(\gamma _{2})$. The latter is true due to the fact that proper bound propagation mapping is monotonic.

For (11) assume that $\underline{a}^{l}_{i}(\gamma _{1})< 0 < \bar{a}^{l}_{i}(\gamma _{1})$. Otherwise the only admissible values for $x_{i}^{l}$ and $(W^{l}x^{l-1} + b^{l})_{i}$ satisfy $x_{i}^{l} =\max ((W^{l}x^{l-1} + b^{l})_{i}, 0)$ and are obviously admissible in case of the less restrictive bounds $\underline{a}^{l}_{i}(\gamma _{2})$ and $\bar{a}^{l}_{i}(\gamma _{2})$ as well. With this assumption (11) can be equivalently reformulated as

$$x^{l}_{i} \le \frac{\bar{a}^{l}_{i}}{\bar{a}^{l}_{i} - \underline{a}^{l}_{i}} \left( (W^{l}x^{l-1} + b^{l})_{i} - \underline{a}^{l}_{i} \right) ,$$

where the right hand side is increasing in $\bar{a}^{l}_{i}$ and decreasing in $\underline{a}^{l}_{i}$ (as long as $\underline{a}^{l}_{i}< 0 < \bar{a}^{l}_{i}$). Therefore it remains true if we replace $\underline{a}^{l}_{i}$, $\bar{a}^{l}_{i}$ by any less restrictive bounds.

To prove the second claim we denote as before the square root of the optimal objective function of DtDB by $d_{\text{adv}}$, that is the distance to the closest adversarial, and the corresponding solution by $x_{\text{adv}}\in {\mathbb {R}}^{n}$. Consider two cases.

If $\gamma > d_{\text{adv}} = \Vert \tilde{x}^{0} - x^{0}_{\text{adv}}\Vert$, then $x_{\text{adv}}$ is an admissible point for QPRel with additional linear constraints due to the first property of a proper bound propagation mapping. The objective function value at this point is $d_{\text{adv}}^{2}$ and therefore the minimum $d_{\text{qp}}(\gamma )^{2}$ cannot be larger (similar to the argumentation in Lemma 1).

If $\gamma \le d_{\text{adv}}$, we directly follow that $d_{\text{qp}} (\gamma ) \le d_{\text{adv}}$ from the assumption. Note that at this point the assumption $d_{\text{qp}}(\gamma ) \le \gamma$ plays a crucial role making it possible to prove that $d_{\text{qp}}$ is a valid lower bound on $d_{\text{adv}}$. $\square$

Lemma 4

Objective function of QPRel is equal to

$$\frac{1}{2} x^{T} M^{\lambda} (W) x + x^{T} B_{1}(b, \lambda , \tilde{x}^{0}) + \Vert \tilde{x}^{0}\Vert ^{2},$$

where $B_{1}$ does not depend on x, $\lambda _{0} = 1$ and

$$\begin{aligned}&M^{\lambda} _{l,l} = \lambda _{{l}} E_{l} \quad \text { for } l=0,\ldots ,L-1,\\&M^{\lambda} _{l,l-1} = -\frac{1}{2}\lambda _{{l}} W^{l} \quad \text { for } l=1,\ldots ,L-1, \\&M^{\lambda} _{l-1,l} = -\frac{1}{2}\lambda _{{l}} (W^{l})^{T} \quad \text { for } l=1,\ldots ,L-1. \end{aligned}$$

Proof

The proof is done by sorting the quadratic, linear and constant terms in the objective function:

$$\begin{aligned}&\Vert x^{0} - \tilde{x}^{0}\Vert ^{2} + \sum _{l=1}^{L-1}\lambda _{{l}} \left( x^{l} \right) ^{T}\left( x^{l} - \left( W^{l}x^{l-1} + b^{l}\right) \right) \\&\quad = \left( x^{0} \right) ^{T} x^{0} - 2\left( x^{0}\right) ^{T} \tilde{x}^{0} + \Vert \tilde{x}^{0}\Vert ^{2} + \sum _{l=1}^{L-1} \lambda _{{l}} \left( \left( x^{l} \right) ^{T}x^{l} - \left( x^{l} \right) ^{T}W^{l}x^{l-1} -\left( x^{l} \right) ^{T}b^{l} \right) \\&\quad = \underbrace{\sum _{l=0}^{L-1} \lambda _{{l}}\left( x^{l} \right) ^{T} x^{l} - \sum _{l=1}^{L-1} \lambda _{{l}}\left( x^{l} \right) ^{T} W^{l}x^{l-1}}_{\text {quadratic term}} \underbrace{- 2\left( x^{0} \right) ^{T}\tilde{x}^{0} -\sum _{l=1}^{L-1} \lambda _{{l}}\left( x^{l} \right) ^{T}b^{l} + \Vert \tilde{x}^{0}\Vert ^{2}}_{\text {linear and constant terms}}. \end{aligned}$$

From the quadratic term we can identify the blocks of $M^{\lambda}$ as claimed. $\square$

C Tables

All tables can be found on pp. 24–27.

Table 5 Runtime comparison, MNIST, $l_{2}$

Full size table

Table 6 Runtime comparison, MNIST, $l_{\infty}$

Full size table

Table 7 Runtime comparison, F-MNIST, train data, $l_{2}$

Full size table

Table 8 Runtime comparison, F-MNIST, $l_{\infty}$

Full size table

Table 9 MOSEK parameters we use to run SDPRel and their default values

Full size table

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Kuvshinov, A., Günnemann, S. Robustness verification of ReLU networks via quadratic programming. Mach Learn 111, 2407–2433 (2022). https://doi.org/10.1007/s10994-022-06132-9

Download citation

Received: 02 May 2021
Revised: 01 October 2021
Accepted: 06 February 2022
Published: 16 March 2022
Issue Date: July 2022
DOI: https://doi.org/10.1007/s10994-022-06132-9

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Robustness verification of ReLU networks via quadratic programming

Abstract

Similar content being viewed by others

Robustness of Piece-Wise Linear Neural Network with Feasible Region Approaches

Maximal Robust Neural Network Specifications via Oracle-Guided Numerical Optimization

Neural Network Robustness as a Verification Property: A Principled Case Study

Explore related subjects

1 Introduction

1.1 Contributions

2 Notation and idea

3 Verification as an optimization task

3.1 Formulation of DtDB

3.2 QP relaxation

3.2.1 QPRel vs. DtDB

Lemma 1

Theorem 1

Theorem 2

Lemma 2

3.3 Improving bounds via additional linear constraints

Lemma 3

3.4 \(l_{\infty}\)-Setting

4 Experiments

5 Conclusion and future work

Data availability

Code availability

Notes

References

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of interest

Ethical approval

Informed consent

Consent for publication

Additional information

Publisher's Note

Appendices

Appendix

A Runtime

B Proofs

Lemma 1

Proof

Theorem 3

Proof

Theorem 4

Proof

Lemma 2

Proof

Lemma 3

Proof

Lemma 4

Proof

C Tables

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation