Semantic Adversarial Deep Learning
 11 Citations
 8.9k Downloads
Abstract
Fueled by massive amounts of data, models produced by machinelearning (ML) algorithms, especially deep neural networks, are being used in diverse domains where trustworthiness is a concern, including automotive systems, finance, health care, natural language processing, and malware detection. Of particular concern is the use of ML algorithms in cyberphysical systems (CPS), such as selfdriving cars and aviation, where an adversary can cause serious consequences.
However, existing approaches to generating adversarial examples and devising robust ML algorithms mostly ignore the semantics and context of the overall system containing the ML component. For example, in an autonomous vehicle using deep learning for perception, not every adversarial example for the neural network might lead to a harmful consequence. Moreover, one may want to prioritize the search for adversarial examples towards those that significantly modify the desired semantics of the overall system. Along the same lines, existing algorithms for constructing robust ML algorithms ignore the specification of the overall system. In this paper, we argue that the semantics and specification of the overall system has a crucial role to play in this line of research. We present preliminary research results that support this claim.
1 Introduction
Machine learning (ML) algorithms, fueled by massive amounts of data, are increasingly being utilized in several domains, including healthcare, finance, and transportation. Models produced by ML algorithms, especially deep neural networks (DNNs), are being deployed in domains where trustworthiness is a big concern, such as automotive systems [35], finance [25], health care [2], computer vision [28], speech recognition [17], natural language processing [38], and cybersecurity [8, 42]. Of particular concern is the use of ML (including deep learning) in cyberphysical systems (CPS) [29], where the presence of an adversary can cause serious consequences. For example, much of the technology behind autonomous and driverless vehicle development is “powered” by machine learning [4, 14]. DNNs have also been used in airborne collision avoidance systems for unmanned aircraft (ACAS Xu) [22]. However, in designing and deploying these algorithms in critical cyberphysical systems, the presence of an active adversary is often ignored.
Adversarial machine learning (AML) is a field concerned with the analysis of ML algorithms to adversarial attacks, and the use of such analysis in making ML algorithms robust to attacks. It is part of the broader agenda for safe and verified MLbased systems [39, 41]. In this paper, we first give a brief survey of the field of AML, with a particular focus on deep learning. We focus mainly on attacks on outputs or models that are produced by ML algorithms that occur after training or “external attacks”, which are especially relevant to cyberphysical systems (e.g., for a driverless car the ML algorithm used for navigation has been already trained by the manufacturer once the “car is on the road”). These attacks are more realistic and are distinct from other type of attacks on ML models, such as attacks that poison the training data (see the paper [18] for a survey of such attacks). We survey attacks caused by adversarial examples, which are inputs crafted by adding small, often imperceptible, perturbations to force a trained ML model to misclassify.
We contend that the work on adversarial ML, while important and useful, is not enough. In particular, we advocate for the increased use of semantics in adversarial analysis and design of ML algorithms. Semantic adversarial learning explores a space of semantic modifications to the data, uses systemlevel semantic specifications in the analysis, utilizes semantic adversarial examples in training, and produces not just output labels but also additional semantic information. Focusing on deep learning, we explore these ideas and provide initial experimental data to support them.
Roadmap. Section 2 provides the relevant background. A brief survey of adversarial analysis is given in Sect. 3. Our proposal for semantic adversarial learning is given in Sect. 4.
2 Background
Example:
Background on Logic. Temporal logics are commonly used for specifying desired and undesired properties of systems. For cyberphysical systems, it is common to use temporal logics that can specify properties of realvalued signals over real time, such as signal temporal logic (STL) [30] or metric temporal logic (MTL) [27].
A signal is a function \(s : D \rightarrow S\), with \(D \subseteq \mathbb {R}_{\ge 0}\) an interval and either \(S \subseteq \mathbb {B}\) or \(S \subseteq \mathbb {R}\), where \(\mathbb {B}= \{\top , \bot \}\) and \(\mathbb {R}\) is the set of reals. Signals defined on \(\mathbb {B}\) are called booleans, while those on \(\mathbb {R}\) are said realvalued. A trace \(w = \{s_1,\dots , s_n\}\) is a finite set of realvalued signals defined over the same interval D. We use variables \(x_i\) to denote the value of a realvalued signal at a particular time instant.
Definition 1
A trace w satisfies a formula \(\varphi \) if and only if \(w,0\,\models \,\varphi \), in short \(w\,\models \,\varphi \). STL also admits a quantitative or robust semantics, which we omit for brevity. This provides quantitative information on the formula, telling how strongly the specification is satisfied or violated for a given trace.
3 Attacks

ML picks an ordered training set \(S = ((x_i,y_i))_{i=1}^m\).

A picks an ordered training set \(\widehat{S} = ((\hat{x_i},\hat{y_i}))_{i=1}^r\), where r is \(\lfloor \epsilon m \rfloor \).
 ML learns on \(S \cup \widehat{S}\) by essentially minimizing$$ \min _{w \in H} L_{ S \cup \widehat{S}} (w). $$
The attacker wants to maximize the above quantity and thus chooses \( \widehat{S}\) such that \(\min _{w \in H} L_{ S \cup \widehat{S}} (w)\) is maximized. For a recent paper on certified defenses for such attacks we refer the reader to [44]. In model extraction attacks an adversary with blackbox access to a classifier, but no prior knowledge of the parameters of a ML algorithm or training data, aims to duplicate the functionality of (i.e., steal) the classifier by querying it on well chosen data points. For an example, modelextraction attacks see [45].
In this paper, we consider testtime attacks. We assume that the classifier \(F_w\) has been trained without any interference from the attacker (i.e. no training time attacks). Roughly speaking, an attacker has an image \(\mathbf{x}\) (e.g. an image of stop sign) and wants to craft a perturbation \(\mathbf{\delta }\) so that the label of \(\mathbf{x}+\mathbf{\delta }\) is what the attacker desires (e.g. yield sign). The next subsection describes testtime attacks in detail. We will sometimes refer to \(F_w\) as simply F, but the hypothesis w is lurking in the background (i.e., whenever we refer to w, it corresponds to the classifier F).
3.1 TestTime Attacks

\(F( \mathbf{x}+ \mathbf{\delta }) \in T\)
The set T constrains the perturbed vector \(\mathbf{x}+\mathbf{\delta }\)^{1} to have the label (according to F) in the set T. For misclassification problems the label of \(\mathbf{x}\) and \(\mathbf{x}+\mathbf{\delta }\) are different, so we have \(T = {\mathcal C} \{ F(\mathbf{x}) \}\). For targeted misclassification we have \(T = \{ t \}\) (for \(t \in {\mathcal C}\)), where t is the target that an attacker wants (e.g., the attacker wants t to correspond to a yield sign).

\(\mathbf{\delta }\cdot \mathbf{M}= 0\)
The vector M can be considered as a mask (i.e., an attacker can only perturb a dimension i if \(M [i] = 0\)), i.e., if \(M [i] = 1\) then \(\mathbf{\delta }[i]\) is forced to be 0. Essentially the attacker can only perturb dimension i if the ith component of M is 0, which means that \(\delta \) lies in kdimensional space where k is the number of nonzero entries in \(\varDelta \). This constraint is important if an attacker wants to target a certain area of the image (e.g., glasses of in a picture of person) to perturb.

Convexity
Notice that even if the metric \(\mu \) is convex (e.g., \(\mu \) is the \(L_2\) norm), because of the constraint involving F, the optimization problem is not convex (the constraint \(\mathbf{\delta }\cdot \mathbf{M}= 0\) is convex). In general, solving convex optimization problems is more tractable nonconvex optimization [34].
CW Targeted Misclassification Attack. The CWattack [5] is widely believed to be one of the most “powerful” attacks. The reason is that CW cast their problem as an unconstrained optimization problem, and then use stateofthe art solver (i.e. Adam [24]). In other words, they leverage the advances in optimization for the purposes of generating adversarial examples.
3.2 Adversarial Training
Once an attacker finds an adversarial example, then the algorithm can be retrained using this example. Researchers have found that retraining the model with adversarial examples produces a more robust model. For this section, we will work with attack algorithms that have a target label t (i.e. we are in the targeted misclassification case, such as JSMA or CW). Let \(\mathcal {A}(w,\mathbf{x},t)\) be the attack algorithm, where its inputs are as follows: \(w \in H\) is the current hypothesis, \(\mathbf{x}\) is the data point, and \(t \in {\mathcal C}\) is the target label. The output of \(\mathcal {A}(w,\mathbf{x},t)\) is a perturbation \(\mathbf{\delta }\) such that \(F(\mathbf{x}+\mathbf{\delta }) = t\). If the attack algorithm is simply a misclassification algorithm (e.g. FGSM or Deepfool) we will drop the last parameter t.
An adversarial training algorithm \(\mathcal {R}_{\mathcal {A}} (w,\mathbf{x},t)\) is parameterized by an attack algorithm \(\mathcal {A}\) and outputs a new hypothesis \(w' \in H\). Adversarial training works by taking a datapoint \(\mathbf{x}\) and an attack algorithm \(\mathcal {A}(w,\mathbf{x},t)\) as its input and then retraining the model using a specially designed loss function (essentially one performs a single step of the SGD using the new loss function). The question arises: what loss function to use during the training? Different methods use different loss functions.
Next, we discuss some adversarial training algorithms proposed in the literature. At a high level, an important point is that the more sophisticated an adversarial perturbation algorithm is, harder it is to turn it into adversarial training. The reason is that it is hard to “encode” the adversarial perturbation algorithm as an objective function and optimize it. We will see this below, especially for the virtual adversarial training (VAT) proposed by Miyato et al. [32].
 1.
Compute the Taylor expansion of \(\varDelta (r, \mathbf{x}_i, w)\) at \(r = 0\), so \(\varDelta (r, \mathbf{x}_i, w) = r^T H(\mathbf{x}_i, w) \; r\) where \(H(\mathbf{x}_i, w)\) is the Hessian matrix of \(\varDelta (r, \mathbf{x}_i, w)\) with respect to r at \(r = 0\).
 2.Thus \(\max _{\Vert r\Vert \le \delta }\varDelta (r, \mathbf{x}_i, w) = \max _{\Vert r\Vert \le \delta }\left( r^T H(\mathbf{x}_i, w) \; r \right) \). By variational characterization of the symmetric matrix (\(H(\mathbf{x}_i, w)\) is symmetric), \(r^* = \delta \bar{v}\) where \(\bar{v} = \overline{v(\mathbf{x}_i, w)}\) is the unit eigenvector of \(H(\mathbf{x}_i, w)\) corresponding to its largest eigenvalue. Note that \(r^*\) depends on \(\mathbf{x}_i\) and w. Therefore the loss function becomes:$$\begin{aligned} \ell _{\mathrm{VAT}}(\theta , \mathbf{x}_i, y_i) = \ell (\theta , \mathbf{x}_i, y_i) + \lambda \varDelta (r^*, \mathbf{x}_i, w) \end{aligned}$$
 3.Now suppose in the process of SGD we are at iteration t with model parameters \(w_t\), and we need to compute \(\partial \ell _{\mathrm{VAT}} / \partial w _{w=w_t}\). By chain rule we need to compute \(\partial r^* / \partial w _{w=w_t}\). However the authors find that such gradients are volatile, so they instead fix \(r^*\) as a constant at the point \(\theta _t\), and compute$$\begin{aligned} \left. \frac{\partial \text {KL}\left( s(F_w)(\mathbf{x})[y], s(F_w)(\mathbf{x}+r)[y] \right) }{\partial w}\right _{w=w_t} \end{aligned}$$
3.3 Black Box Attacks
Recall that earlier attacks (e.g. FGSM and JSMA) needed whitebox access to the classifier F (essentially because these attacks require first order information about the classifier). In this section, we present blackbox attacks. In this case, an attacker can only ask for the labels \(F(\mathbf{x})\) for certain data points. Our presentation is based on [36], but is more general.
Let \(\mathcal {A}(w,\mathbf{x},t)\) be the attack algorithm, where its inputs are: \(w \in H\) is the current hypothesis, \(\mathbf{x}\) is the data point, and \(t \in {\mathcal C}\) is the target label. The output of \(\mathcal {A}(w,\mathbf{x},t)\) is a perturbation \(\mathbf{\delta }\) such that \(F(\mathbf{x}+\mathbf{\delta }) = t\). If the attack algorithm is simply a misclassification algorithm (e.g. FGSM or Deepfool) we will drop the last parameter t (recall that in this case the attack algorithm returns a \(\mathbf{\delta }\) such that \(F(\mathbf{x}+\mathbf{\delta }) \not = F(\mathbf{x})\)). An adversarial training algorithm \(\mathcal {R}_{\mathcal {A}} (w,\mathbf{x},t)\) is parameterized by an attack algorithm \(\mathcal {A}\) and outputs a new hypothesis \(w' \in H\) (this was discussed in the previous subsection).
Initialization: We pick a substitute classifier G and an initial seed data set \(S_0\) and train G. For simplicity, we will assume that the sample space \(Z = X \times Y\) and the hypothesis space H for G is same as that of F (the classifier under attack). However, this is not crucial to the algorithm. We will call G the substitute classifier and F the target classifier. Let \(S=S_0\) be the initial data set, which will be updated as we iterate.
3.4 Defenses
Defenses with formal guarantees against testtime attacks have proven elusive. For example, Carlini and Wagner [6] have a recent paper that breaks ten recent defense proposals. However, defenses that are based on robustoptimization objectives have demonstrated promise [26, 33, 43]. Several techniques for verifying properties of a DNN (in isolation) have appeared recently (e.g., [12, 13, 19, 23]). Due to space limitations we will not give a detailed account of all these defenses.
4 Semantic Adversarial Analysis and Training

Semantic Modification Space: Recall that the goal of adversarial attacks is to modify an input vector \(\mathbf{x}\) with an adversarial modification \(\mathbf{\delta }\) so as to achieve a target misclassification. Such modifications typically do not incorporate the applicationlevel semantics or the context within which the neural network is deployed. We argue that it is essential to incorporate more applicationlevel, contextual semantics into the modification space. Such semantic modifications correspond to modifications that may arise more naturally within the context of the target application. We view this not as ignoring arbitrary modifications (which are indeed worth considering with a security mind set), but as prioritizing the design and analysis of DNNs towards semantic adversarial modifications. Sect. 4.1 discusses this point in more detail.

SystemLevel Specifications: The goal of much of the work in adversarial attacks has been to generate misclassifications. However, not all misclassifications are made equal. We contend that it is important to find misclassifications that lead to violations of desired properties of the system within which the DNN is used. Therefore, one must identify such systemlevel specifications and devise analysis methods to verify whether an erroneous behavior of the DNN component can lead to the violation of a systemlevel specification. Systemlevel counterexamples can be valuable aids to repair and redesign machine learning models. See Sect. 4.1 for a more detailed discussion of this point.

Semantic (Re)Training: Most machine learning models are trained with the main goal of reducing misclassifications as measured by a suitably crafted loss function. We contend that it is also important to train the model to avoid undesirable behaviors at the system level. For this, we advocate using methods for semantic training, where systemlevel specifications, counterexamples, and other artifacts are used to improve the semantic quality of the ML model. Sect. 4.2 explores a few ideas.

ConfidenceBased Analysis and Decision Making: Deep neural networks (and other ML models) often produce not just an output label, but also an associated confidence level. We argue that confidence levels must be used within the design of MLbased systems. They provide a way of exposing more information from the DNN to the surrounding system that uses its decisions. Such confidence levels can also be useful to prioritize analysis towards cases that are more egregious failures of the DNN. More generally, any explanations and auxiliary information generated by the DNN that accompany its main output decisions can be valuable aids in their design and analysis.
4.1 Compositional Falsification
We discuss the problem of performing systemlevel analysis of a deep learning component, using recent work by the authors [9, 10] to illustrate the main points. The material in this section is mainly based on [40].
Example Problem. As an illustrative example, consider a simple model of an Automatic Emergency Braking System (AEBS), that attempts to detect objects in front of a vehicle and actuate the brakes when needed to avert a collision. Figure 1 shows the AEBS as a system composed of a controller (automatic braking), a plant (vehicle subsystem under control, including transmission), and an advanced sensor (camera along with an obstacle detector based on deep learning). The AEBS, when combined with the vehicle’s environment, forms a closed loop control system. The controller regulates the acceleration and braking of the plant using the velocity of the subject (ego) vehicle and the distance between it and an obstacle. The sensor used to detect the obstacle includes a camera along with an image classifier based on DNNs. In general, this sensor can provide noisy measurements due to incorrect image classifications which in turn can affect the correctness of the overall system.
Suppose we want to verify whether the distance between the ego vehicle and a preceding obstacle is always larger than 2 m. In STL, this requirement \(\varPhi \) can be written as \(\mathtt {G}_{0,T} (\Vert \mathbf {x}_{\text {ego}}  \mathbf {x}_{\text {obs}}\Vert _2 \ge 2)\). Such verification requires the exploration of a very large input space comprising of the control inputs (e.g., acceleration and braking pedal angles) and the machine learning (ML) component’s feature space (e.g., all the possible pictures observable by the camera). The latter space is particularly large—for example, note that the feature space of RGB images of dimension \(1000\times 600\) px (for an image classifier) contains \(256^{1000\times 600 \times 3}\) elements.
In the above example, \(S \Vert E\) is the closed loop system in Fig. 1 where S comprises the DNN and the controller, and E comprises everything else. C is the DNN used for object detection and classification.
This case study has been implemented in Matlab/Simulink^{3} in two versions that use two different Convolutional Neural Networks (CNNs): the Caffe [20] version of AlexNet [28] and the Inceptionv3 model created with Tensorflow [31], both trained on the ImageNet database [1]. Further details about this example can be obtained from [9].
We formalize this approach while trying to emphasize the intuition. Let T denote the set of all possible traces of the composition of the system with its environment, \(S \Vert E\). Given a specification \(\varPhi \), let \(T_{\varPhi }\) denote the set of traces in T satisfying \(\varPhi \). Let \(U_{\varPhi }\) denote the projection of these traces onto the state and interface variables of the environment E. \(U_{\varPhi }\) is termed as the validity domain of \(\varPhi \), i.e., the set of environment behaviors for which \(\varPhi \) is satisfied. Similarly, the complement set \(U_{\lnot \varPhi }\) is the set of environment behaviors for which \(\varPhi \) is violated.
 1.
The Systemlevel Verifier initially performs two analyses with two extreme abstractions of the ML component. First, it performs an optimistic analysis, wherein the ML component is assumed to be a “perfect classifier”, i.e., all feature vectors are correctly classified. In situations where ML is used for perception/sensing, this abstraction assumes perfect perception/sensing. Using this abstraction, we compute the validity domain for this abstract model of the system, denoted \(U^+_{\varPhi }\). Next, it performs a pessimistic analysis where the ML component is abstracted by a “completelywrong classifier”, i.e., all feature vectors are misclassified. Denote the resulting validity domain as \(U^_{\varPhi }\). It is expected that \(U^+_{\varPhi } \supseteq U^_{\varPhi }\).
Abstraction permits the Systemlevel Verifier to operate on a lowerdimensional search space and identify a region in this space that may be affected by the malfunctioning of component C—a socalled “region of uncertainty” (ROU). This region, \(U^C_{ROU}\) is computed as \(U^+_{\varPhi } \setminus U^_{\varPhi }\). In other words, it comprises all environment behaviors that could lead to a systemlevel failure when component C malfunctions. This region \(U^C_{ROU}\), projected onto the inputs of C, is communicated to the ML Analyzer. (Concretely, in the context of our example of Sect. 4.1, this corresponds to finding a subspace of images that corresponds to \(U^C_{ROU}\).)
 2.
The Componentlevel Analyzer, also termed as a Machine Learning (ML) Analyzer, performs a detailed analysis of the projected ROU \(U^C_{ROU}\). A key aspect of the ML analyzer is to explore the semantic modification space efficiently. Several options are available for such an analysis, including the various adversarial analysis techniques surveyed earlier (applied to the semantic space), as well as systematic sampling methods [9]. Even though a componentlevel formal specification may not be available, each of these adversarial analyses has an implicit notion of “misclassification.” We will refer to these as componentlevel errors. The working of the ML analyzer from [9] is shown in Fig. 3.
 3.
When the Componentlevel (ML) Analyzer finds componentlevel errors (e.g., those that trigger misclassifications of inputs whose labels are easily inferred), it communicates that information back to the Systemlevel Verifier, which checks whether the ML misclassification can lead to a violation of the systemlevel property \(\varPhi \). If yes, we have found a systemlevel counterexample. If no componentlevel errors are found, and the systemlevel verification can prove the absence of counterexamples, then it can conclude that \(\varPhi \) is satisfied. Otherwise, if the ML misclassification cannot be extended to a systemlevel counterexample, the ROU is updated and the revised ROU passed back to the Componentlevel Analyzer.
Sample Results. We have applied the above approach to the problem of compositional falsification of cyberphysical systems (CPS) with machine learning components [9]. For this class of CPS, including those with highly nonlinear dynamics and even blackbox components, simulationbased falsification of temporal logic properties is an approach that has proven effective in industrial practice (e.g., [21, 46]). We present here a sample of results on the AEBS example from [9], referring the reader to more detailed descriptions in the other papers on the topic [9, 10].
For further details about this and other results with our approach, we refer the reader to [9, 10].
4.2 Semantic Training
In this section we discuss two ideas for semantic training and retraining of deep neural networks. We first discuss the use of hinge loss as a way of incorporating confidence levels into the training process. Next, we discuss how systemlevel counterexamples and associated misclassifications can be used in the retraining process to both improve the accuracy of ML models and also to gain more assurance in the overall system containing the ML component. A more detailed study of using misclassifications (ML componentlevel counterexamples) to improve the accuracy of the neural network is presented in [11]; this approach is termed counterexampleguided data augmentation, inspired by counterexampleguided abstraction refinement (CEGAR) [7] and similar paradigms.
Experimental Setup. As in the preceding section, we consider an Automatic Emergency Braking System (AEBS) using a DNNbased object detector. However, in these experiments we use an AEBS deployed within Udacity’s selfdriving car simulator, as reported in our previous work [10].^{4} We modified the Udacity simulator to focus exclusively on braking. In our case studies, the car follows some predefined waypoints, while accelerating and braking are controlled by the AEBS connected to a convolutional neural network (CNN). In particular, whenever the CNN detects an obstacle in the images provided by the onboard camera, the AEBS triggers a braking action that slows the vehicle down and avoids the collision against the obstacle.
Consider what happens as we vary k. Suppose there is an \(i \ne l\) s.t. \(\hat{y_i} > \hat{y}_l\). Pick the largest such i, call it \(i^*\). For \(k=0\), we will incur a loss of \(\hat{y}_{i^*}  \hat{y}_l\) for the example (x, y). However, as we make k more negative, we increase the tolerance for “misclassifications” produced by the DNN F. Specifically, we incur no penalty for a misclassification as long as the associated confidence level deviates from that of the ground truth label by no more than k. Larger the absolute value of k, the greater the tolerance. Intuitively, this biases the training process towards avoiding “high confidence misclassifications”.
In this experiment, we investigate the role of k and explore different parameter values. At training time, we want to minimize the mean hinge loss across all training samples. We trained the CNN described above with different values of k and evaluated its precision on both the original test set and a set of counterexamples generated for the original model, i.e., the network trained with crossentropy loss.
Hinge loss with different k values.
k  \(T_{original}\)  \(T_{countex}\)  

Acc  Logloss  Acc  Logloss  
0  0.69  0.68  0.11  0.70 
−0.01  0.77  0.69  0.00  0.70 
−0.05  0.52  0.70  0.67  0.69 
−0.1  0.50  0.70  0.89  0.68 
−0.25  0.51  0.70  0.77  0.68 
Table 1 shows interesting results. We note that a negative k increases the accuracy of the model on counterexamples. In other words, biasing the training process by penalizing highconfidence misclassifications improves accuracy on counterexamples! However, the price to pay is a reduction of accuracy on the original test set. This is still a very preliminary result and further experimentation and analysis is necessary.
In an experiment, we augment the original training set with the elements of \(T_{countex}\), i.e., images of the original test set \(T_{original}\) that are misclassified by the original model (see Sect. 4.2).
We trained the model with both crossentropy and hinge loss for 20 epochs. Both models achieve a high accuracy on the validation set (\({\approx }92\%\)). However, when plugged into the AEBS, neither of these models prevents the vehicle from colliding against the obstacle with an adversarial configuration. This seems to indicate that simply retraining with some semantic (systemlevel) counterexamples generated by analyzing the system containing the ML model may not be sufficient to eliminate all semantic counterexamples.
Interestingly, though, it appears that in both cases the impact of the vehicle with the obstacle happens at a slower speed than the one with the original model. In other words, the AEBS system starts detecting the obstacle earlier than with the original model, and therefore starts braking earlier as well. This means that despite the specification violations, the counterexample retraining procedure seems to help with limiting the damage in case of a collision. Coupled with a runtime assurance framework (see [41]), semantic retraining could help mitigate the impact of misclassifications on the systemlevel behavior.
5 Conclusion
In this paper, we surveyed the field of adversarial machine learning with a special focus on deep learning and on testtime attacks. We then introduced the idea of semantic adversarial machine (deep) learning, where adversarial analysis and training of ML models is performed using the semantics and context of the overall system within which the ML models are utilized. We identified several ideas for integrating semantics into adversarial learning, including using a semantic modification space, systemlevel formal specifications, training using semantic counterexamples, and utilizing more detailed information about the outputs produced by the ML model, including confidence levels, in the modules that use these outputs to make decisions. Preliminary experiments show the promise of these ideas, but also indicate that much remains to be done. We believe the field of semantic adversarial learning will be a rich domain for research at the intersection of machine learning, formal methods, and related areas.
Footnotes
 1.
The vectors are added component wise.
 2.
In general, secondorder derivatives of a classifier corresponding to a DNN vanish at several points because several layers are piecewise linear.
 3.
 4.
Udacity’s selfdriving car simulator: https://github.com/udacity/selfdrivingcarsim.
 5.
Socket.IO protocol: https://github.com/socketio.
Notes
Acknowledgments
The first and third author were supported in part by NSF grant 1646208, the DARPA BRASS program under agreement number FA875016C0043, the DARPA Assured Autonomy program, and Berkeley Deep Drive.
References
 1.Imagenet. http://imagenet.org/
 2.Alipanahi, B., Delong, A., Weirauch, M.T., Frey, B.J.: Predicting the sequence specificities of DNAand RNAbinding proteins by deep learning. Nat. Biotechnol. 33, 831–838 (2015)CrossRefGoogle Scholar
 3.Barreno, M., Nelson, B., Joseph, A.D., Tygar, J.D.: The security of machine learning. Mach. Learn. 81(2), 121–148 (2010)MathSciNetCrossRefGoogle Scholar
 4.Bojarski, M., Del Testa, D., Dworakowski, D., Firner, B., Flepp, B., Goyal, P., Jackel, L., Monfort, M., Muller, U., Zhang, J., Zhang, X., Zhao, J., Zieba, K.: End to end learning for selfdriving cars. Technical report (2016). CoRR, abs/1604.07316. http://arxiv.org/abs/1604.07316
 5.Carlini, N., Wagner, D.: Towards evaluating the robustness of neural networks. In: IEEE Symposium on Security and Privacy (2017)Google Scholar
 6.Carlini, N., Wagner, D.: Adversarial examples are not easily detected: bypassing ten detection methods. In: ACM Workshop on Artificial Intelligence and Security (2017)Google Scholar
 7.Clarke, E., Grumberg, O., Jha, S., Lu, Y., Veith, H.: Counterexampleguided abstraction refinement. In: Emerson, E.A., Sistla, A.P. (eds.) CAV 2000. LNCS, vol. 1855, pp. 154–169. Springer, Heidelberg (2000). https://doi.org/10.1007/10722167_15CrossRefGoogle Scholar
 8.Dahl, G.E., Stokes, J.W., Deng, L., Yu, D.: Largescale malware classification using random projections and neural networks. In: Proceedings of the IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pp. 3422–3426. IEEE (2013)Google Scholar
 9.Dreossi, T., Donzé, A., Seshia, S.A.: Compositional falsification of cyberphysical systems with machine learning components. In: Barrett, C., Davies, M., Kahsai, T. (eds.) NFM 2017. LNCS, vol. 10227, pp. 357–372. Springer, Cham (2017). https://doi.org/10.1007/9783319572888_26CrossRefGoogle Scholar
 10.Dreossi, T., Donzé, A., Seshia, S.A.: Compositional falsification of cyberphysical systems with machine learning components. CoRR, abs/1703.00978 (2017)Google Scholar
 11.Dreossi, T., Ghosh, S., Yue, X., Keutzer, K., SangiovanniVincentelli, A., Seshia, S.A.: Counterexampleguided data augmentation. In: International Joint Conference on Artificial Intelligence (IJCAI), July 2018Google Scholar
 12.Dutta, S., Jha, S., Sankaranarayanan, S., Tiwari, A.: Output range analysis for deep neural networks (2018, to appear)Google Scholar
 13.Dvijotham, K., Stanforth, R., Gowal, S., Mann, T., Kohli, P.: A Dual Approach to Scalable Verification of Deep Networks. ArXiv eprints, March 2018Google Scholar
 14.Eddy, N.: AI, machine learning drive autonomous vehicle development (2016). http://www.informationweek.com/bigdata/bigdataanalytics/aimachinelearningdriveautonomousvehicledevelopment/d/did/1325906
 15.Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press (2016). http://www.deeplearningbook.org
 16.Goodfellow, I.J., Shlens, J., Szegedy, C.: Explaining and harnessing adversarial examples. In: Proceedings of the 2015 International Conference on Learning Representations. Computational and Biological Learning Society (2015)Google Scholar
 17.Hinton, G., Deng, L., Dong, Y., Dahl, G.E., Mohamed, A., Jaitly, N., Senior, A., Vanhoucke, V., Nguyen, P., Sainath, T.N., et al.: Deep neural networks for acoustic modeling in speech recognition: the shared views of four research groups. IEEE Signal Process. Mag. 29(6), 82–97 (2012)CrossRefGoogle Scholar
 18.Huang, L., Joseph, A.D., Nelson, B., Rubinstein, B.I.P., Tygar, J.D.: Adversarial machine learning. In: Proceedings of the 4th ACM Workshop on Security and Artificial Intelligence, pp. 43–58. ACM (2011)Google Scholar
 19.Huang, X., Kwiatkowska, M., Wang, S., Wu, M.: Safety verification of deep neural networks. In: Majumdar, R., Kunčak, V. (eds.) CAV 2017. LNCS, vol. 10426, pp. 3–29. Springer, Cham (2017). https://doi.org/10.1007/9783319633879_1CrossRefGoogle Scholar
 20.Jia, Y., Shelhamer, E., Donahue, J., Karayev, S., Long, J., Girshick, R., Guadarrama, S., Darrell, T.: Caffe: convolutional architecture for fast feature embedding. In: ACM Multimedia Conference, ACMMM, pp. 675–678 (2014)Google Scholar
 21.Jin, X., Donzé, A., Deshmukh, J., Seshia, S.A.: Mining requirements from closedloop control models. IEEE Trans. Comput.Aided Des. Circuits Syst. 34(11), 1704–1717 (2015)CrossRefGoogle Scholar
 22.Julian, K., Lopez, J., Brush, J., Owen, M., Kochenderfer, M.: Policy compression for aircraft collision avoidance systems. In: Proceedings of the 35th Digital Avionics Systems Conference (DASC) (2016)Google Scholar
 23.Katz, G., Barrett, C., Dill, D.L., Julian, K., Kochenderfer, M.J.: Reluplex: an Efficient SMT solver for verifying deep neural networks. In: Majumdar, R., Kunčak, V. (eds.) CAV 2017. LNCS, vol. 10426, pp. 97–117. Springer, Cham (2017). https://doi.org/10.1007/9783319633879_5CrossRefGoogle Scholar
 24.Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization (2017). https://arxiv.org/abs/1412.6980
 25.Knorr, E.: How PayPal beats the bad guys with machine learning (2015). http://www.infoworld.com/article/2907877/machinelearning/howpaypalreducesfraudwithmachinelearning.html
 26.Kolter, J.Z., Wong, E.: Provable defenses against adversarial examples via the convex outer adversarial polytope. CoRR, abs/1711.00851 (2017)Google Scholar
 27.Koymans, R.: Specifying realtime properties with metric temporal logic. RealTime Syst. 2(4), 255–299 (1990)CrossRefGoogle Scholar
 28.Krizhevsky, A., Sutskever, I., Hinton, G.E.: Imagenet classification with deep convolutional neural networks. In: Advances in Neural Information Processing Systems, pp. 1097–1105 (2012)Google Scholar
 29.Lee, E.A., Seshia, S.A.: Introduction to Embedded Systems: A CyberPhysical Systems Approach, 2nd edn. MIT Press, Cambridge (2016)zbMATHGoogle Scholar
 30.Maler, O., Nickovic, D.: Monitoring temporal properties of continuous signals. In: Lakhnech, Y., Yovine, S. (eds.) FORMATS/FTRTFT  2004. LNCS, vol. 3253, pp. 152–166. Springer, Heidelberg (2004). https://doi.org/10.1007/9783540302063_12CrossRefzbMATHGoogle Scholar
 31.Martín Abadi et al. TensorFlow: largescale machine learning on heterogeneous systems (2015). Software: tensorflow.org
 32.Miyato, T., Maeda, S., Koyama, M., Nakae, K., Ishii, S.: Distributional smoothing by virtual adversarial examples. CoRR, abs/1507.00677 (2015)Google Scholar
 33.Mdry, A., Makelov, A., Schmidt, L., Tsipras, D., Vladu, A.: Towards deep learning models resistant to adversarial attacks. In: ICLR (2018)Google Scholar
 34.Nocedal, J., Wright, S.: Numerical Optimization. Springer, New York (2006). https://doi.org/10.1007/9780387400655CrossRefzbMATHGoogle Scholar
 35.NVIDIA: Nvidia Tegra Drive PX: Selfdriving Car Computer (2015)Google Scholar
 36.Papernot, N., McDaniel, P., Goodfellow, I., Jha, S., Celik, Z.B., Swami, A.: Practical blackbox attacks against machine learning. In: Proceedings of the 2017 ACM Asia Conference on Computer and Communications Security (AsiaCCS), April 2017Google Scholar
 37.Papernot, N., McDaniel, P., Jha, S., Fredrikson, M., Celik, Z.B., Swami, A.: The limitations of deep learning in adversarial settings. In: Proceedings of the 1st IEEE European Symposium on Security and Privacy. arXiv preprint arXiv:1511.07528 (2016)
 38.Pennington, J., Socher, R., Manning, C.D.: Glove: global vectors for word representation. In: Proceedings of the Empirical Methods in Natural Language Processing (EMNLP 2014), vol. 12, pp. 1532–1543 (2014)Google Scholar
 39.Russell, S., Dietterich, T., Horvitz, E., Selman, B., Rossi, F., Hassabis, D., Legg, S., Suleyman, M., George, D., Phoenix, S.: Letter to the editor: research priorities for robust and beneficial artificial intelligence: an open letter. AI Mag. 36(4), 3–4 (2015)CrossRefGoogle Scholar
 40.Seshia, S.A.: Compositional verification without compositional specification for learningbased systems. Technical report UCB/EECS2017164, EECS Department, University of California, Berkeley, November 2017Google Scholar
 41.Seshia, S.A., Sadigh, D., Sastry, S.S.: Towards Verified Artificial Intelligence. ArXiv eprints, July 2016Google Scholar
 42.Shin, E.C.R., Song, D., Moazzezi, R.: Recognizing functions in binaries with neural networks. In: 24th USENIX Security Symposium (USENIX Security 2015), pp. 611–626 (2015)Google Scholar
 43.Sinha, A., Namkoong, H., Duchi, J.: Certifiable distributional robustness with principled adversarial training. In: ICLR (2018)Google Scholar
 44.Steinhardt, J., Koh, P.W., Liang, P.: Certified defenses for data poisoning attacks. In: Advances in Neural Information Processing Systems (NIPS) (2017)Google Scholar
 45.Tramer, F., Zhang, F., Juels, A., Reiter, M., Ristenpart, T.: Stealing machine learning models via prediction APIs. In: USENIX Security (2016)Google Scholar
 46.Yamaguchi, T., Kaga, T., Donzé, A., Seshia, S.A.: Combining requirement mining, software model checking, and simulationbased verification for industrial automotive systems. In: Proceedings of the IEEE International Conference on Formal Methods in ComputerAided Design (FMCAD), October 2016Google Scholar
Copyright information
<SimplePara><Emphasis Type="Bold">Open Access</Emphasis>This chapter is licensed under the terms of the Creative Commons Attribution 4.0 International License(http://creativecommons.org/licenses/by/4.0/), which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons license and indicate if changes were made.</SimplePara><SimplePara>The images or other third party material in this chapter are included in the chapter's Creative Commons license, unless indicated otherwise in a credit line to the material. If material is not included in the chapter's Creative Commons license and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder.</SimplePara>