An Empirical Study of Derivative-Free-Optimization Algorithms for Targeted Black-Box Attacks in Deep Neural Networks

We perform a comprehensive study on the performance of derivative free optimization (DFO) algorithms for the generation of targeted black-box adversarial attacks on Deep Neural Network (DNN) classifiers assuming the perturbation energy is bounded by an $\ell_\infty$ constraint and the number of queries to the network is limited. This paper considers four pre-existing state-of-the-art DFO-based algorithms along with the introduction of a new algorithm built on BOBYQA, a model-based DFO method. We compare these algorithms in a variety of settings according to the fraction of images that they successfully misclassify given a maximum number of queries to the DNN. The experiments disclose how the likelihood of finding an adversarial example depends on both the algorithm used and the setting of the attack; algorithms limiting the search of adversarial example to the vertices of the $\ell^\infty$ constraint work particularly well without structural defenses, while the presented BOBYQA based algorithm works better for especially small perturbation energies. This variance in performance highlights the importance of new algorithms being compared to the state-of-the-art in a variety of settings, and the effectiveness of adversarial defenses being tested using as wide a range of algorithms as possible.

in a variety of settings, and the effectiveness of adversarial defenses being tested using as wide a range of algorithms as possible.
Keywords Derivative Free Optimization · Deep Learning · Black-Box Attacks

Introduction
Deep Neural Networks (DNNs) achieve state-of-the-art performance on a growing number of applications such as acoustic modelling [22], image classification [20], and fake news detection [27] to name but a few. Alongside their growing application, there is a literature on the robustness of deep networks which shows that it is often possible to subtly perturb the input image of a DNN in order to degrade its performance; these perturbations are referred to as adversarial examples [15,36]. For example, see [11,14,25,35,41] where road signals are perturbed so as to be wrongly interpreted by self driving cars that analyze images of them with DNNs. Methods to generate these adversarial examples are classified according to two main criteria [41]: Adversarial Specificity establishes what the aim of the adversary is. In non-targeted attacks, the method perturbs the image in such a way that it is misclassified into any category other than the original one. While in targeted settings, the adversary specifies a category into which an image should be misclassified. Adversary's Knowledge defines the amount of information available to the adversary. In White-box settings the adversary has complete knowledge of the network architecture and weights, while in the Black-box setting the adversary is only able to obtain the pre-classification output vector. The White-box setting allows for the use of gradients of a missclassification objective to efficiently compute the adversarial example [4,8,15], while the same optimization formulation of the Black-box setting requires use of a derivative free approach [2,9,23,29].
In this work we consider the targeted black-box setting. In particular we follow [9] where: the perturbation, which causes the network to change the classification, is bounded in magnitude by a specified ∞ -norm, ε ∞ , i.e. each pixel in the image cannot be perturbed by more than ε ∞ ; the number of queries to the DNN needed to generate a targeted adversarial example should be as small as possible.
The Zeroth-Order-optimization (ZOO) algorithm proposed in [9] describes a Derivative Free optimization (DFO) method for computing adversarial examples in the black-box setting using a coordinate descent optimization method. At the time this was a substantial departure from previous black-box algorithms which trained a proxy DNN and then employ gradient based white-box attacks on the proxy network [32,38]. It was demonstrated in [9] that these algorithms are especially effective when numerous adversarial examples are computed, but become less efficient when an individual adversarial examples is considered. Following the introduction of ZOO, there have been numerous improvements using other model-free DFO based approaches, see for example [1-3, 7, 23, 24, 28]. Many of these algorithms were developed in parallel, and so have not yet been bench-marked in a consistent setting, e.g. on the same network.
In this article, we present two frameworks for comparative evaluation of the existing algorithms that claim to have the fewest number of DNN queries to generate a successful The success rate (SR) of targeted attacks as a function of the perturbation's allowed ∞ magnitude for algorithms: GenAttack [2], Parsimonious [28], Square [3], Frank-Wolfe [7], and the BOBYQA based algorithm introduced here. Specifically for a ResNet50 network trained either on the CIFAR10 (a) or the ImageNet (b) dataset with (Adv) and without (Non-Adv) the defense by MadryLab [13]. An attack is considered successful if the method found the targeted adversarial example with less than 3'000 or 15'000 queries to the network trained on CIFAR and ImageNet dataset, respectively; Results for the case SR=0 i.e., when no perturbations were successful, are excluded from the plot. attack. These are: GenAttack [2] which is based on a genetic direct-search method; Parsimonious algorithm [28], based on a combinatorial direct-search method on the vertices of the perturbation domain; the Square algorithm [3], based on a randomized direct-search method on the vertices of the perturbation domain; and the Frank-Wolfe algorithm [7] based on a momentum mechanism that approximates the gradient via finite differences. We also introduce a new algorithm built on a model-based DFO method [39]. In particular, we consider the Bounded optimization BY Quadratic Approximation (BOBYQA) [33] model-based DFO method which explicitly develops pseudo models to approximate the loss function in the optimization problem and then minimizes the loss function using methods from continuous optimization on the generated models. The aforementioned list of algorithms covers the leading classes of DFO algorithms for limited function evaluations, see e.g., [10,26] for recent reviews of DFO methods. The two frameworks are structured as follows: 1. In the first setting we consider attacks on DNNs trained on CIFAR10 and ImageNet datasets, with or without the adversarial defense by MadryLab [13]; this is the canonical setup for the comparison of black-box attacks that was considered in previous literature. We illustrate in Figure 1 a measure of how the performance of the considered algorithms compare, while further refined measures of comparison are included in Section 4. We observe that the algorithms that limit the optimization domain to the ∞ perturbation boundary, i.e. the Parsimonious and Square algorithms, are consistently the most effective.
In particular, the Square algorithm achieves the highest Success Ratio (SR) with a fixed maximum number of queries, except for when the DNNs have been adversarially trained, and the Parsimonious algorithm achieves the highest SR when a network is trained with the MadryLab defense. However, these results are relative to the current state-of-the-art defense in a field which is in continuous development [12,40] and newly proposed methods usually have a varying effect on the different attacking algorithms; for example the MadryLab defense [13] that we consider is most effective on Square algorithm in the ImageNet case. 2. In the second framework, the algorithms are allowed to perturb only a fraction of the pixels in the input; this is especially inspired by the structural defenses that transform the input in the wavelet space [18]. This framework allows us to understand the sensitivity of different algorithms to choices such as initialization, experimental protocol, dataset, and adversarial training. Our results demonstrate that the Parsimonious, Square, and BOBYQA based algorithms alternatively perform the best for different maximum perturbation energies.
The results in this paper show that the most likely algorithm to find an adversarial example varies according to the considered setting; the type of dataset, the defense, and the perturbation energy bound have a varying impact on the different algorithms. As a consequence of these experiments, new algorithms should be compared to the state-of-the-art in a variety of settings as done here, and the effectiveness of an adversarial defense should be tested with a variety of algorithms, including the BOBYQA based algorithm introduced in this paper. The outline of the paper is as follows: in Section 2 we present how an adversarial example is generated by solving an optimization problem, and how DFO methods fit in this context. We also introduce the model-based BOBYQA algorithm. In Section 3 we present two popular techniques used in existing methods to improve the efficiency and scalability to high dimensional inputs. Section 4 presents the experimental setup and a comparative analysis of existing algorithms along with a focus on our proposed BOBYQA based algorithm. We close with some concluding remarks in Section 5.

Adversarial Examples Formulated as an optimization Problem
In classification tasks, a DNN outputs a vector whose length is equal to the number of classes and the DNN parameters are trained to match the maximum element of the given output to the correct class of the input. Adversarial perturbations are obtained by modifying the input in such a way that the maximum element of DNN output corresponds to a target class different from the original one.
Consider a classification operator F : X → C from input space X to output space C of classes. A targeted adversarial perturbation η η η to an input X ∈ X has the property that it changes the classification to a specified target class t, i.e F(X) = c and F(X + η η η) = t = c.
Following the formulation in [2]; given an input space X = [l, u] n , with l and u being respectively the minimum and maximum values of the interval in which the pixels may vary, an output space C = {1, . . . , n c }, where n c is the number of classes, a maximum energy budget ε ∞ , and a suitable loss function L , then the task of computing the adversarial perturbation η η η can be cast as an optimization problem such as where the final two inequality constraints are due to the perturbed image being still an image, i.e. (X+η η η) ∈ X . Denoting the pre-classification output vector by f (X), i.e. F(X) = arg max f (X), then the misclassification of X to target label t is achieved by η η η if f (X + η η η) t ≥ max j =t f (X + η η η) j . As demonstrated in [2,4,9], in this study we consider the following loss function for computing η η η in (1) Not having access to the internal parameters of the DNN, the gradient of the loss over the input space cannot be readily computed and instead the adversarial perturbation is found using specially adapted DFO algorithms.

Derivative Free optimization for Adversarial Examples
Derivative Free optimization is a well developed field with numerous classes of methods, see [10] and [26] for reviews on DFO principles and algorithms. Example classes of such methods include: direct search methods such as simplex, model-based methods, hybrid methods such as finite differences or implicit filtering, as well as randomized variants of the aforementioned and methods specific to convex or noisy objectives. For the generation of adversarial examples, the algorithms that we consider rely on three types of DFO methods: those where the gradient is computed via finite differences, either by sampling all the canonical directions as in ZOO attack [9] or random directions as in the Frank-Wolfe algorithm [7]; those where the solution is thought to be in one of the vertices of the ∞ domain, i.e. η η η i ∈ {−ε ∞ , ε ∞ } for any i. The Parsimonious algorithm [28] implements a combinatorial direct-search within the different possible vertices, initializing the perturbation to −ε ∞ for all the pixels and then switching collections of them to +ε ∞ , when such an action decreases the loss function. The Square algorithm [3] instead implements a randomized direct-search method where square blocks of pixels are iteratively perturbed to be either +ε ∞ or −ε ∞ ; those where a direct search over the perturbation domain is performed using a genetic method such as GenAttack [2]. The optimization formulation in (1) is amenable to virtually all DFO methods, making it unclear which of the methods would be most effective in this context. Further, modelbased methods are notably missing from the aforementioned list. Thus for completeness, we introduce an algorithm relying on a model-based method; specifically, BOBYQA is considered given its proven effectiveness in solving complex problems such as climate modelling [37].

Model-Based DFO
Given a set of q samples Y = {y 1 , ..., y q } with y i ∈ R n , model-based DFO methods start by identifying the minimizer of the objective among the samples at iteration k, x k = arg min y∈Y L (y). Following this, a model for the objective function L is constructed, typically centered around the minimizer. In its simplest form one uses a polynomial approximation to the objective, such as a quadratic model centered in x k with a k ∈ R, c k , p ∈ R n , and M k ∈ R n×n being also symmetric. In a white-box setting one would set c k = ∇L (x k ) and M k = ∇ 2 L (x k ), but this is not feasible in the black-box setting as we do not have access to the derivatives of the objective function. Thus at each iteration k, the parameters a k , c k and M k are usually defined by imposing interpolation conditions and when q < 1 + n + n(n + 1)/2 (i.e. the system of equations is under-determined) other conditions are introduced according to which method is considered. The objective model (3) is considered to be a good estimate of the objective in a neighborhood referred to as a trust region. Once the model m k is generated, the update step p is computed by solving the trust region problem where ∆ is the radius of the region where we believe the model to be accurate, for more details see [31]. The new point x k + p is added to Y and a prior point is potentially removed.
In this paper, we consider an exemplary model-based method called BOBYQA.

BOBYQA
The Bound Optimization BY Quadratic Approximation (BOBYQA) method, introduced in [33], updates the parameters of the model a, c, and M, in each iteration in such a way as to minimize the change in the quadratic term M k between iterates while otherwise fitting the sample values: with n + 1 < q < 1 + n + n(n + 1)/2 and M k initialized as the zero matrix. When the number of parameters q = n + 1 then the model is considered as linear with M k set as zero. Every time a new query is done, the sample which is the least important geometrically is removed from Y , thus keeping the dimension of Y fixed.

Improving Efficiency and Computational Scalability
Because of the high number of pixels in the input images, the generation of adversarial examples involves solving a high dimensional problem, which makes the use of any DFO method impractical; for instance, the application of the BOBYQA method requires the solution of (6) which scales in memory allocation at least quadratically with the input dimension, and thus is computationally too expensive. Consequently, the implementation of DFO based adversarial algorithms relies on strategies to reduce the dimensionality of the problem, this improves the computational scalability along with the efficiency, as demonstrated experimentally. Instead of solving (1) for η η η ∈ R n directly, the DFO based algorithms consider variations of the domain sub-sampling and/or hierarchical liftings techniques. Domain sub-sampling iteratively sweeps over batches of b n variables, while hierarchical lifting clusters and perturbs variables simultaneously, as described in following sections.

Domain Sub-Sampling
The simplest version of domain sub-sampling consists of partitioning the input dimension into smaller disjoint domains and optimizing the loss function in each of them sequentially. This is, in an n dimensional problem, one considers k = n/b sets of integers, {Ω j } k j=1 , of size b n which are disjoint and which cover all of [n]. Then (1) is solved sequentially on the dimensions identified by the sets Ω j . This is possible since the optimization domain is box like, i.e. η η η ∈ [l, u] n , and each dimension's bound is independent from the others. Formally, rather than solving (1) for η η η ∈ R n directly, for each of j = 1, . . . , k one sequentially solves for the η η η j ∈ R n variables which are only non-zero for entries in Ω j . The resulting sub-domain perturbations η η η j are then summed to generate the full perturbation η η η = ∑ k j=1 η η η j , see Figure  2 as an example. That is, the optimization problem (1) is adapted to repeatedly looping over j = 1, . . . , k: where the sets {Ω j } k j=1 are usually computed again once j is equal to k, and the sub-domain perturbations η η η j are initialized as null.
We identified three possible ways of selecting the sub-domains {Ω j } k j=1 ; -In Random Sampling one considers at each iteration a different random sub-samplings of the domain, i.e. k = 1. The ZOO algorithm used this kind of sampling [9]. -In Ordered Sampling one generates a random disjoint partitioning of the domain, i.e. k = n/b and Ω j ∩ Ω l = / 0 for any j and l. A new partitioning is generated when each variable has been optimized over once. This sampling is implemented in the Parsimonious algorithm. -In Variance Sampling one still generates a a random disjoint partitioning of the domain, but chooses the sub-samplings sets {Ω j } k j=1 in order to optimize over the dimensions that have highest local variance in intensity first. Specifically, the variables are ordered by the variance in intensity among the 8 neighboring variables (e.g. pixels) in the same color channel of the input X. The sets {Ω j } k j=1 are further reinitialized after each loop through j = 1, . . . , k.
The sub-sampling of the domain affects the efficiency with which an algorithm successfully finds an adversarial example. For instance, in Figure 3 we compare how these different sub-sampling techniques affect the BOBYQA based algorithm when generating adversarial example for the MNIST and CIFAR10 dataset. It can be observed that variance sampling consistently has a higher success rate cumulative distribution function as compared with random and ordered sampling. This suggest that pixels belonging to high-contrast regions are more influential than the ones in low-contrast ones, and hence variance sampling is the preferable ordering.
To simplify the notation in the following section, the optimization variable is considered to be η η η j = Ω Ω Ω jη η η j whereη η η j ∈ R b and Ω Ω Ω j ∈ R n×b is such that [Ω Ω Ω j ] pq is one if the qth element of Ω j is p, zero otherwise. The implementation of variance sampling method at iteration j in a domain of dimension n is summarized in Algorithm 1.

Hierarchical Lifting
Authors of ZOO attack [9] demonstrated that fewer queries are required to find adversarial example when pixels are considered in clusters, and not independently. This lead to the hierarchical lifting approach where one optimizes over increasingly higher dimensional spaces at each step, referred here as level ; Figure 4 shows how effective this approach is when implementing the BOBYQA based algorithm. These low dimensional spaces are lifted to the image space via a linear lifting, where at each level a linear lifting D : R n → R n is considered and a perturbationη η η ∈ R n is found to be added to the full perturbation η η η, according to Here η η η 0 is initialized as 0 and the perturbations η η η j of the previous layers are considered as fixed. An example of how this works is illustrated in Figure 5.
All the methods considered in this work rely on ideas which can be interpreted through this approach. The algorithms that we consider in this work rely on two kinds of linear lifting D differentiated by the way each scalar inη η η is associated to a set of pixels in the original Find η 1 in the selected grid Find η 2 in the selected grid Fig. 5: Example of how the perturbation η η η is generated in a hierarchical lifting method with n 1 = 4 and n 2 = 16 on an image in R 12×12 . In (a) the perturbation is η η η = η η η 0 and the boxes generated via the grid of dimension n 1 are highlighted in red. Once the optimal perturbation η η η 1 is found, the perturbation is updated in (b) and the image is further divided with a grid with n 2 blocks. The final solution obtained after optimization is shown in (c).
image domain R n ; namely the random and the block liftings. The former relates a random set of pixels of the original image to each hyper-variable; this forces the perturbation to be of high-frequency nature, as illustrated in Figure 6(a), which several articles indicate as being the most effective [16,17,34]. The GenAttack and Frank-Wolfe algorithms use a variation of this kind of lifting. The latter instead is based on interpolation operations; a sorting matrix S : R n → R n is applied such that every index ofη η η is uniquely associated to a node of Random Lifting Fig. 6: Examples for (a) random and (b) block liftings. In the random case each pixel in the perturbation is associated to just one element ofη η η . Block lifting uses a piece-wise constant interpolation L over a coarse grid Sη η η and each block is associated uniquely to one of the variables inη η η . In both cases, the lifting D is such that each element D i j is either 1 or 0. a coarse grid masked over the original image. Afterwards, an interpolation L : R n → R n is implemented over the values in the coarse grid, i.e. η η η = L S η η η = D η η η . Both Square and Parsimonious algorithms implement hierarchical lifting with the piece-wise constant interpolation, here referred to as block lifting. At the lower levels the interpolation lifting generates low frequency perturbations, as illustrated in Figure 6(b).
Since n may still be very high, for each level domain sub-sampling is also applied consideringη η η = ∑ k j=0η η η j . In the piece-wise constant case with variance sampling, the blocks are ordered according to the variance of mean intensity among neighboring blocks, in contrast to the variance within each block as suggested in [9]. Consequently, at each level the adversarial example is found by solving the following iterative problem miñ η η η j L X +η η η, D Ω Ω Ω kη η η j Algorithm 2 GENERATE LIFTING(n ,n) 1: D ← 0 ∈ R n×n 2: for i = 1, . . . , n do 3: Generate set of pixels S that are in the block associated to the i-th element of the n dimensional super-grid. 4: for j ∈ S do 5: D(i, j) = 1. 6: end for 7: end for 8: Return D.

Comparison of Derivative Free Methods
In this section, we compare algorithms based on a selection of state-of-the-art DFO methods. In particular we consider BOBYQA based algorithm [39], GenAttack algorithm [2], Parsimonious algorithm [28], Square algorithm [3] and Frank-Wolfe algorithm [7] in the following two frameworks: -Section 4.3 considers the canonical setup for black-box adversarial attacks on which the considered algorithms have been tuned in their respective articles. Specifically, we consider attacks on networks trained adversarially or not on CIFAR10 and ImageNet, two popular datasets in the literature, and with no further defense implemented. -Section 4.4 considers a setup that simulates structural defenses on which the different algorithms were not tuned. We limit the perturbation to a fixed number of pixels with high variance in intensity considering attacks on a network non-adversarially trained on the CIFAR10 dataset.
The performance of all algorithms is measured in terms of the distribution of queries needed to successfully find adversaries to identical networks given a fixed ∞ perturbation constraint and the same input images.

Parameter Setup for Algorithms
The experiments use publicly available implementations for the GenAttack [2], Parsimonious [28], Square [3], and Frank-Wolfe [7] algorithms 1 using the same hyper-parameter setting and hierarchical lifting approach as suggested by the respective authors.
For the BOBYQA based algorithm [39], from Figure 3 we observed that the loss function is influenced the most by the pixels in high-contrast areas. Hence, we first apply the variance sub-sampling method followed by block lifting as described in Section 3.2 2 . Here, we 1 GenAttack: https://github.com/nesl/adversarial_genattack Parsimonious algorithm: https://github.com/snu-mllab/parsimonious-blackbox-attack Square algorithm: https://github.com/max-andr/square-attack Frank-Wolfe algorithm https://github.com/uclaml/Frank-Wolfe-AdvML 2 The choice for this kind of lifting was driven by preliminary experiments in which we considered also a grid method with linear interpolation and a random lifting method as well. It is possible to run the analysis thanks to the code in 3 Algorithm 3 BOBYQA Based Algorithm 1: Input: Image X ∈ R n , target label t, maximum perturbation ε ∞ , Neural Net F, initial hierarchical level grid dimensions m, maximum number of queries n max , batch sampling size b, and maximum number κ of queries that we are allowed to do for each batch. 2: Initialize η η η ← 0 ∈ R n , n eval = 0, = 1, n = 12. 3: while arg max F(X + η η η) = t and n eval < n max do 4: # Compute the number of sub samplings necessary to cover the whole domain 5: num sub = n/(n * b) 6: # Generate the lifting matrix 7: D = GENERATE LIFTING(n , n) 8: # Minimize on all the sampled sub-domains 9: for j = 1, . . . , num sub do 10: # Compute the matrix which selects b dimensions of the m-dimensional domain.

19:
end for 20: + = 1, n * = 4. 21: end while 22: if arg max F(X + η η η) = t then 23: The perturbation is successful. 24: else if n eval > n max then 25: The perturbation was not successful with n max iterations. 4: for j = 1, . . . , κ − b do 5: Add x to the set of samples and get rid of the least informative one according to [33].

6:
Build the new model m j according to (6). 7: Find minimizer x of m j such that D Ω Ω Ω j x ∈ [a, b].
consider an initial domain of dimension n 1 = 2 × 2 × 3, and double the refinement of the grid at each layer, i.e. n +1 = 4n . Moreover, we observe for (6), the choice of a linear model to approximate the loss function works best, and we consequently consider the linear approximation in this paper; i.e., M = 0 and q = n+1 at all iterations, see [39]. The BOBYQA based algorithm is summarized in Algorithm 3 and a Python implementation of the proposed algorithm based on BOBYQA package from [6] is available on Github 3 .

Dataset and Neural Network Specifications
We performed experiments using the popular ResNet50 architecture [21] with two training scenarios; one with the unperturbed images, and one with the defense 4 proposed in [13]. The number of experiments and the choice of the targets for each individual dataset is described below.
CIFAR10 The CIFAR10 data-set contains images from 10 classes and of dimension 32x32x3.
To generate a comprehensive distribution for the queries at each energy budget, ten correctly classified images are consider per each class, and each of them is targeted to all of the 9 remaining classes; this way we generate a total of 900 attacks per maximum perturbation energy per adversarial method.
ImageNet This data-set contains millions of images with a dimension of 299x299x3 divided among 1000 classes. Because of the high dimensionality and number of classes, random images are attacked considering a random target class. We conducted 200 and 160 tests for networks trained both with and without adversarial training per maximum perturbation energy.

Results for Standard and MadryLab Trained DNNs
In Figures 7 and 8 we present the cumulative fraction of images successfully misclassified (abridged by CDF for cumulative distribution function) as a function of the number of queries to the DNN for different maximum perturbation energies ε ∞ . The pixels are normalized to be in the interval (−1/2, 1/2), hence, ε ∞ = 0.1 would imply that any pixel is allowed to change 10% of the total intensity range from its initial value. The CDFs are illustrated so that we can easily see which method has been able to misclassify the largest fraction of images in the given test-set for a fixed number of queries to the DNN. For the CIFAR10 data-set in Figure 7, we observe that algorithms that search the perturbation directly in the vertices of the perturbation domain require the least amount of network queries. In the case of non-adversarially trained networks, the Square algorithm is able to misclassify using the least number of queries; this is demonstrated by its associated solid green CDF being consistently above that of the other methods. Specifically, when ε ∞ = 0.05, at 1,000 queries Square algorithms has a CDF of 0.97 compared to 0.94 and 0.88 of the Parsimonious and BOBYQA methods respectively, and for ε ∞ = 0.005 at 3,000 queries Square achieves a CDF of 0.20 which is 50% times higher than Parsimonious and BOBYQA. When the net is instead trained adversarially, dashed lines, Square algorithm looses a lot of its effectiveness becoming comparable to the BOBYQA based method, while Parismonious algorithm achieves almost always the highest fraction of successfully perturbed images for any given maximum number of queries. For example, when ε ∞ = 0.05 at 3,000 queries the CDF of Parisomonious is 0.29 compared to 0.25 and 0.23 of Square and BOBYQA.
In the ImageNet dataset, see Figure 8(a), we observe that an adversarial method can be especially susceptible to particular defenses. Specifically, when the network is trained without a defense, the Square algorithm has a success rate CDF that is consistently higher than the other methods, but the success rate CDF for the Square algorithm is decreased by the MadryLab defense so that it is substantially less effective than Parsimonious and BOBYQA  Figure 8(a) Parisomious has a CDF of 0.33 at 15,000 queries while BOBYQA 0.24 and Square 0.07. The rate with which the CDFs decrease as the maximum perturbation energy ε ∞ decreases it also differs by algorithm. The CDF for Square decreases moderately faster than for Parsimonious such that Square has a consistently higher CDF than Parsimonious for ε = 0.1 in Figure 8(a) but consistently lower in Figure 8(d). Moreover, the success rate for BOBYQA decreases the slowest with ε ∞ such that in Figure 8 its CDF is similar to or grater than Parsimonious. Specifically, in Figure 8(d) at 15,000 the final CDF of BOBYQA algorithm queries is 1.42 times higher than the one of the Square algorithm.
The Frank-Wolfe algorithm is able to achieve results comparable to the ones of the methods above while considering the small-dimensional problem of CIFAR10 with a very low maximum perturbation energy. However, when considering the ImageNet case and the adversarially trained DNNs, the Frank-Wolfe algorithm has a substantially lower success rate CDF; e.g. in the ImageNet case with non-adversarial training, Square algorithm achieves a CDF 1.66 times higher than the Frank-Wofle algorithm when ε ∞ = 0.05.
Finally, GenAttack has a higher success rate CDF than the Frank-Wolfe algorithm in the ImageNet case for ε ∞ = 0.1, see Figure 8(a), but, besides this case, it constantly achieves the lowest success rate.

Results with Fixed Pixel Count Constraints
In addition to network training designed to increase robustness, such as MadryLab considered previously, there are a multitude of other defenses and real world constraints [19]. The relative success rate, or other characteristics, of adversarial algorithms can be expected to differ in these diverse settings. To demonstrate this, we consider one such setting where the maximum number of pixels allowed to be perturbed is limited. This is motivated by the defenses where network inputs are thresholded in a wavelet domain to exclude high frequency perturbations [18], as well as by real world constraints such as attacks designed to appear structured such as localized perturbations designed to look like graffiti [14,30]. We allow the algorithms to perturb only the fixed selection of the 1,000 pixels of the targeted image that have the highest variance in intensity in their channel neighborhood. Because of the previous results it is possible to identify three methods that work consistently better than the others, and thus only these will be considered, namely: the Parsimonious, the Square, and the BOBYQA based algorithms. To allow the perturbations to be limited to the selected pixels, we consider the Square algorithm with squares of pixel dimension, the Parsimonious algorithm on the finest grid, and the BOBYQA algorithm without the hierarchical lifting, i.e. D 1 = I where I is the identity matrix.
The results reported in Figure 9 suggest that when the domain is dimensionally limited, the most efficient algorithm changes according to the allowed maximum perturbation energy. When the maximum perturbation energy decreases and the linear model is more accurate, the BOBYQA method manages to achieve a higher SR than both Square and Parsimonious algorithms, unlike in the previous experiments. Moreover, the Parsimonious algorithm has almost identical behavior to Square algorithm for high energy bounds, but becomes more efficient when the maximum energy is ε ∞ = 0.05. We also considered experiments on ImageNet, but limiting the number of pixels that could be perturbed did not allow for any successful misclassification with less than 15,000 queries.

Discussion and Conclusion
We have compared for the first time how the the existing GenAttack [2], Parsimonious [28], Square [3], and Frank-Wolfe [7] algorithms, and the newly introduced BOBYQA based method, behave when the available ∞ energy for a perturbation varies, and an adversarial training or a structural defense is considered.
The results suggest that those methods limiting the search for an adversarial example to the vertices of the ∞ perturbation domain generally work better. Whilst Square algorithm is especially effective on the non-adversarially trained networks, the Parsimonious algorithm manages to outperform any other approach when the networks are adversarially trained with the MadryLab implementation. Furthermore, the Parsimonious algorithm performs better than Square when considering the structural defense that limits the attacks on some pixels, suggesting that an algorithm based on combinatorial search is robust in its hyper-parameters to the setting where it is applied. The BOBYQA based algorithm was introduced in this paper to explore how model-based approaches compare to the state-of-the-art algorithms, and was found to achieve similar results to the Parsimonious and Square algorithms. In almost in all the experiments the BOBYQA based algorithm achieves a success rate CDF comparable to the ones of the Parsimonious and the Square algorithms; it achieves the state-of-the-art success rate at saturation for low maximum perturbation energy constraint both in the ImageNet case and in the pixel constrained problem. Moreover, new dimensionality reduction techniques that are being considered in DFO, see for example [5], might improve the results observed here and lead to a state-of-the-art algorithm for the generation of adversarial examples.
In conclusion, we find that both the structure of the algorithm and the attack setting have the potential to impact the algorithm performance. These observations highlight the importance of comparing any new algorithm to the state-of-the-art in a variety of different settings, such as is done here. Similarly, the effectiveness of an adversarial defense for DNNs should always be tested using as wide a range of algorithms as possible.