Introduction

Deep learning algorithms such as deep neural networks (DNNs) [1] have emerged in the last decade for many applications, such as image recognition and voice recognition. The accuracy results (high prediction scores) and low false positive rates generated by these models have encouraged researchers to apply DNNs to safety-critical applications such as autonomous vehicles, malware detection and face recognition. Today, industries are beginning to develop autonomous vehicles, or self-driving vehicles, that do not require human intervention [2]. Safety is one of the main concerns in designing self-driving vehicles. Autonomous vehicles bring many benefits, such as saving time, increasing human safety, reducing traffic congestion, reducing the release of carbon, reducing death rates, and decreasing fuel consumption [1, 2]. For example, IHS Markit forecasts that by 2040, autonomous vehicle sales will ex ceed 33 million. Society of Automotive Engineers (SAE) International introduced six levels for classifying automated systems in vehicles, ranging from level 0 (no au tomation) to level 5 (full automation). Level 4 (high automation) is in the market and is providing services such as Google Waymo [3] and TuSimple [4] to the public, but level 5 (full automation) is still under testing [5].

The safety of autonomous vehicles’ driving depends on the robustness of their DNN classifiers. DNN classifiers depend heavily on sensors for high-accuracy object detection. Correct interpretation leads to correct decisions [6]. This means that DNN classifiers have to be impervious to any small modifications in input images that could otherwise lead them to misclassify objects [7]. This raises many concerns and challenges for researchers regarding DNN reliability and security. One of the most serious security issues in DNNs is the threat of adversarial attacks, also known as adversarial examples.

The main goal of adversarial DNN attacks is to use DNN vulnerabilities and generate an adversarial image capable of fooling DNNs into producing incorrect predictions [8,9,10,11]. An adversarial attack in an image classification model is considered successful if the generated adversarial image is classified by the target DNN as a different class label (not the correct image class) with a high confidence rate [12]. The following equation expresses this mathematically in simple form: x̄=x + ϵ.

Many attack strategies have been developed to fool DNN models in various domains and applications. Examples of these attacks are DeepSearch [13], greedy local search [12], the Fast Gradient Sign Method (FGSM) [14] and Projected Gradient Descent (PGD) [7]. In addition, various recent studies have proposed a number of defenses for increasing the security and robustness of DNN models [15,16,17,18]. Increasing DNN robustness and protection against adversarial attacks is a growing research challenge.

The potential risks associated with the DNN classifiers mentioned above will affect the development of autonomous vehicles and their deployment in industry. If autonomous vehicles cannot ensure human safety on the road, consumers will not accept this technology. Therefore, it is essential to determine whether deep learning systems in autonomous vehicles are vulnerable, how they could be attacked, how much damage could be caused by such attacks and what measures have been proposed to defend against these attacks. The industry needs this analysis and information to improve the safety and robustness of DNNs.

The motivation for conducting this survey was the realization that the world may someday depend on autonomous vehicles to make life easier [17]. However, autonomous vehicles have experienced high rates of accidents compared with human-driven vehicles, though these accidents involve fewer injuries. According to the National Law Review, there are 9.1 autonomous vehicle accidents per million miles driven on average [19]. The main reason for this high rate is the frequency of at tacks on DNN classifier systems in autonomous vehicles [20].

Several researchers, including [7, 12, 21], have recently presented survey papers concerning adversarial attacks on autonomous vehicles. These attacks can be digital or physical. The aim of digital attacks is to find image pixels that generate new fake images, while physical attacks aim to change the environments where the autonomous vehicles exist. Deng et al. [12], provided a survey investigating various types of digital adversarial attack techniques. Examples of digital attacks include the Iterative targeted fast gradient sign method (IT-FGSM) [22], optimization universal adversarial perturbation (Opt uni) [23], AdvGAN [24], and AdvGAN uni [24]. Another survey, conducted by Modas et al. [7], discusses attacks and countermeasures using physical adversarial attacks on autonomous vehicles. Examples of physical attacks are patch or sticker attacks [25], ultrasonic attacks [26, 27] and lidar attacks [28]. However, there is still a lack of systematic surveys on DNN adversarial attack and defense techniques and DNN behavioral tests in the digital and physical autonomous vehicle environment. In addition, there is an urgent need to combine these techniques into one survey to guide researchers for future improvement. Therefore, this survey on DNN adversarial examples aims to reveal potential threats in autonomous vehicles’ physical and digital environment to encourage researchers to deploy defense techniques in advance. Also, artificial intelligence security research has become an important research direction. The contributions of this survey are as follows:

  • This survey summarizes the presented generation algorithm to formalize DNN adversarial attacks in autonomous vehicles’ digital and physical environment. Also, how to apply the DNN adversarial attacks generated in the digital environment to the physical environment is also discussed.

  • We investigate and discuss the latest critical defense techniques against DNN adversarial attacks in the autonomous vehicle environment, provide descriptions of these techniques, and explain the main observations of these methods.

  • We provide taxonomies for DNN adversarial attacks, defenses and DNN behavioral tests to systematically analyze these techniques.

  • We discuss the issues with the existing research on DNN behavioral tests, adversarial attacks and defenses, and, based on this, recommend future work.

In the following sections, we give an overview on autonomous vehicles and deep learning algorithms and their roles in autonomous vehicle technology. Next, we present DNN vulnerabilities. Then, we discuss the adversarial attacks taxonomy as well as the DNN defense taxonomy and DNN testing frameworks. Finally, we identify and present the challenges and opportunities for future work on DNN adversarial attacks, defenses, and testing frameworks.

Main text

The remainder of this paper is organized as follows: “DNN in Autonomous Vehicles” section introduces the aspects of autonomous vehicles and the background needed to understand the DNN concepts discussed throughout this paper. “Adversarial attack taxonomy” section presents the adversarial attack taxonomy. The “Defenses taxonomy” section appears in section 4. “DNN evaluation framework” section presents the criteria that have been used to test and evaluate DNNs. “Discussion and future research directions” section discusses challenges and potential directions for future research on adversarial attacks. Finally, “Conclusions” section concludes the paper.

DNN in autonomous vehicles

This section presents information on autonomous vehicle technology, the background information needed to understand the deep learning concepts and reinforcement learning.

Autonomous vehicles

Autonomous vehicles employ artificial intelligence techniques such as intelligent agents. An agent perceives its environment through sensors, decides on suitable actions and applies these actions in the environment through actuators. Automated driving technology consists of three basic functional layers: a sensing layer, a perception layer and a control (decision) layer. The sensing layer has many types of sensors, including lidar, camera, and radar sensors. These sensors are located in the front and back of the vehicle. Lidar sensors are used for many purposes, such as object detection, during which the sensor detects light waves reflected by the objects in the surrounding environment. Camera sensors are used to capture video of the surrounding environment and images such as road signs. Radar sensors are used for simulating vision, monitoring weather conditions and avoiding road objects. The sensors are responsible for collecting data from the environment. The perception layer contains DNNs to analyze and interpret the data collected by the sensors and extract logical information, as shown in Fig. 1. This layer takes the input signals and processes them to make sense of the information they provide the system. The decision layer is responsible for decision-making, such as routing, and controls driving, self-parking, determination of the steering angle and lane detection [1, 29].

Fig. 1
figure 1

Overview of the DNN role in an autonomous driving model [12]

SAE International has introduced six levels for classifying automated systems in vehicles. These levels range from level 0 (no automation) to level 5 (full automation). Table 1 presents an overview of these levels and the criteria that distinguish them. For more information about automated vehicles, see [29].

Table 1 The six levels of vehicle automation systems

Deep neural networks overview

Deep learning is a subset of machine learning in which the abstraction of the underlying knowledge is learned from the dataset. The main difference between machine algorithms and DNNs is that DNNs do not require domain knowledge or feature engineering because they are end-to-end learning processes. This learning process, which is referred to as representation learning, makes it more transferable to other models. Deep learning algorithms use multiple layers to learn data features. Moreover, algorithms require immense datasets to learn and process because the efficiency of DNNs depends on the dataset size and vast computer resources. DNNs in autonomous vehicles can be classified into two main types [30]:

  1. 1

    Feed-forward neural networks (FNNs)

  2. 2

    Recurrent neural networks (RNNs)

The main difference between feed-forward neural networks (FNNs) and recurrent neural networks (RNNs), as shown in Fig. 2, is that FNNs do not retain the values of the last layer neurons, which means that the neuron values propagate in one direction. This feed-forward operation makes FNNs more suitable for non-sequential applications, such as images. RNNs, meanwhile, memorize the output of the last layer of neurons, which renders them suitable for sequential applications, such as audio [31].

Fig. 2
figure 2

The difference between RNNs and FNNs

FNN

Convolutional neural networks (CNNs) are examples of FNNs. The CNN architecture has two parts: fully connected neural layers and convolutional layers. Fully connected neural networks are connected networks of layers. The formal architecture consists of an input layer, one or more hidden layers and an output layer. The input layer is responsible for passing the input data to the hidden layers. The hidden layer or layers are responsible for extracting the features and for information analysis. The output layer is responsible for predicting the input class. For example, as shown in Fig. 3, each neuron (blue circle) in any layer Li is connected to every neuron in the next layer Li+1 (red edges). This structure applies to each neuron in each layer, except the last layer (output). The hidden layers are denoted in Fig. 3 as h1 and h2. The circles represent neurons, and the edges represent the inner product between the corresponding weights and the previous layer neurons. If the result of this multiplication is high, the information or feature relevant to this neuron is important and the neuron should be activated. The convolution layers have neurons that are connected only to certain neurons in the next layer, and the same weights are shared among different connections for different neurons [32].

Fig. 3
figure 3

The simple architecture of fully connected layers

RNN

The architecture of RNNs is shown in Fig. 4. In addition to the operation process in CNNs, RNNs allow the memorized last neuron output to be considered in the calculation of the current neuron prediction; that is, the neuron output is fed to the previous layer and the next layer. This operation determines the prediction output for the previous input. This approach is helpful for many applications, such as video frame sequences, where the current frame is predicted by the previous frame [30].

Fig. 4
figure 4

The basic architecture of RNNs [30]

Reinforcement learning

The agent in reinforcement learning (RL) learns how to improve its behavior in a certain environment by interacting with that environment. Unlike supervised learning, the relation and mapping from the input to the output is not told explicitly to the agent. Rather than using trial and error, a reward function is used to evaluate actions and update performance [33,34,35]. There are several DRL approaches for autonomous vehicles’ decision making in adversarial settings such as [36, 37]. A framework was proposed by [38] to evaluate an adversarial agent based on DRL. The aim was to measure the reliability of autonomous vehicles’ mechanisms for avoiding collisions and motion planning. Another study [39] proposed a defense that is an extension of two game-theoretic algorithms (robust adversarial reinforcement learning and neural FSP) to a semi-competitive game environment. However, for the next sections we will concentrate on DNN.

Adversarial attack taxonomy

In this section, we first present DNN adversarial vulnerabilities and then discuss adversarial image generation methods. Finally, we present the adversary’s means, the adversary’s goals, and the adversary’s knowledge along with studies on these topics.

DNN adversarial vulnerabilities

DNN adversarial image appearance was first demonstrated by Goodfellow [14] in 2014. Many vulnerabilities exist in DNNs, leading to adversarial attacks. Below, we discuss some of these vulnerabilities.

DNN decision boundary vulnerability

The basic layout of DNNs is to take a raw image as input and output the correct classification label. The classification can be either a binary label or a multiclass label. DNNs consist of a hidden layer with weights and an activation function that can recognize the underlying object structure. As mentioned in “DNN adversarial vulnerabilities” section, DNNs are end-to-end learning processes. This characteristic opens the door for adversaries to exploit DNN vulnerabilities and generate new methods of attack [40].

To understand adversarial attacks on DNNs, consider Fig. 5, which shows a binary model that can classify the input as an orange region (class 1) or a yellow region (class 2). The distribution of the data corresponds to these two classes. The cross points (x) correspond to the images used to train the DNN model. The red line corresponds to the decision boundary learned during the training phase to produce the final classification label.

Fig. 5
figure 5

a The nature of DNN training data with decision boundary, b adversarial point within orange class but crossing DNN decision boundary, c moving the point of the input image from point x to point O to fool the DNN model into misclassifying the input image

The decision boundary means that the images under the line will be predicted as class 1 and that those above the line will be predicted as class 2. As shown in the figure, the decision boundary does not fill all the data to avoid an overfitting problem, which causes the DNN model to predict well with the training data but poorly with the test data. The adversary takes this limitation of the DNN learned decision boundary and obtains a small perturbation value (point O) that falls within the orange region but crosses the decision boundary of the DNN model. The adversary will try to move point (x) across the decision boundary to reach point (O). Thus, the adversary starts to fool the model into misclassifying the points as class 2, where they actually exist in class 1. Therefore, one of the reasons for adversarial image appearance is the decision boundary vulnerability of the trained DNN model [18]. To explain further, in the distribution of the data in the trained model, where each class has a region and boundary, the decision boundary limits the distance between the data within the same class. Thus, that area needs to be maintained so that data cannot easily move to other areas [28].

DNN transferability vulnerability

Another reason for adversarial attack appearance is the DNN transferability property [41]. Consider two models, model A and model B, with the same domain and classification. We can generate adversarial images from model A and poison model B with these data to lead model B to misclassify the data [42, 43]. Research has shown that in this way, vulnerability can be transferred from an insecure model to a secure model [8, 9, 14, 44].

To build robust DNN models that resist adversarial attacks, we need to understand the reasons behind adversarial images and fully understand the structure of DNNs.

Adversarial image generation methods

In this subsection, we discuss two main ways attackers modify original images to produce adversarial images: image distance metric and image transformation.

Image distance metric

Consider that we have a classifier in model G and a clean sample image x with its true label y. The adversary goal is to find and generate a synthesized image x̄ that looks perceptually similar to x but can still lead the classifier to misclassify the image as a different, incorrect class t [40]. The most widely employed norms and their meanings are shown in Table 2 [9, 51, 52]. Various studies have been conducted using various norms, such as the one-pixel attack [53], to demonstrate how to constrain the L0 norm to limit the number of pixels that can be changed in the clean image. The allowed perturbation in this case is one pixel, which generates an adversarial image.

Table 2 Most popular distance norm used in adversarial images

Image transformation

Adversarial images can be generated through one of two methods of image transformation [30, 40, 54]:

  1. 1

    Image pixel transformation focuses on changing the pixel value of the original (clean) image. An example of image pixel transformation is changing the pixel color depth and brightness (referred to as a semantic attack). Traditional adversarial attack generation focuses on changing the image similarity metric to a metric less than the Lp norm, while the newest attacks, which are referred to as semantic attacks, focus on changing the image similarity metric to a metric greater than the Lp norm [9, 52].

  2. 2

    Image affine transformation, which focuses on spatial modification. Examples of image affine transformation are rotation, translation, and scaling.

Threat model

The information about various kinds of adversarial attacks is dependent on the various adversary capabilities. This means that each adversarial attack can be classified by the adversary’s means, goal or knowledge about the target DNN. As shown in Fig. 6, we present the adversarial threat model, which organizes the attacks that target DNN models. We discuss these categories in in more detail in the next subsections.

Fig. 6
figure 6

Adversarial threat model

Adversary means

Adversarial attacks can be generated using physical modifications or digital modifications, as shown in Fig. 7.

Fig. 7
figure 7

Digital and physical adversarial attacks

Physical attack

In a physical attack, the perturbator changes the environment of the application domain. For example, in autonomous vehicles, the adversary may change a stop sign to a 45-mph speed limit sign [25]. The physical modification can be effected using items such as stickers and printed posters. Fooling the DNN and misleading the model can lead to incorrect decisions such as ignoring a stop sign, causing accidents and compromising the safety of humans [17]. Another type of adversarial attack on autonomous vehicles can be accomplished through modified camera angles, such that, in the training set, the image is stored with a fixed degree [55]. The DNN can recognize the motorbike correctly, but only after changing and rotating the image. The camera viewing angle has been modified according to the DNN, which causes the motorbike to be misclassified as a person [56].

Digital attack

In a digital attack, perturbation is employed after the DNN’s weaknesses are explored by altering some of the predefined pixels of the input image. The purpose is to modify the input image in such a way that the model will misclassify it [7]. An adversarial attack in an image classification model is considered successful if the generated adversarial image is classified by the target DNN as a different class (not the correct image class) with a high confidence rate [12].

Adversary goal

Based on the adversary’s goals, an adversarial attack can be targeted or non-targeted.

Targeted attack

In a targeted attack, the adversary’s goal is to force the DNN model to change the correct class label of an input image to a specific, different target class label.

Non-targeted attack

In a non-targeted attack, the adversary’s goal is to force the DNN model to change the correct class label to any other class label [40].

Adversary knowledge

An adversarial attack can be categorized as one of two types, based on the adversary’s knowledge: a white-box attack or a black-box attack.

Black-box attacks

In a black-box attack, the structure of the deep neural network is not known to the adversary. The algorithm parameters are unknown as well. This makes the attack more challenging for the adversary. The adversary tries to estimate the gradients of the target DNN model in order to produce an adversarial image. The adversary can access the output of the model and query the target model to obtain the probability scores of all classes. Examples of these attacks are Zeroth-Order Optimization (ZOO) [57], DeepSearch [13], and greedy local search [58].

One of the most famous classical strategies in black-box attacks is ZOO. In this type of attack [57], the adversary completes multiple forward passes on the target model to estimate the gradient. As shown in Eq. 1, the forward pass f is performed on the clean image x and a small perturbation ϵi. This equation determines the probability score of the logits of the model.

$$\frac{\partial f(x)}{\partial x}=\frac{f\left(x+h{\in}_i\right)-f\left(x-h{\in}_i\right)}{2h}$$
(1)

Amin et al. [52] proposed a shadow attack that targets certified defenses. The shadow attack was successful at breaking randomized smoothing [59] and crown interval-bound propagation [60] defenses. This shadow attack is the generalization of the PGD attack. It concentrates on generating an adversarial example with a spoofed certificate. The attack algorithm works to change the image brightness or darkness with a small change in color depth. However, this attack design was designed specifically for untargeted attacks. The computational cost was not discussed, and the attack was not tested on road sign images.

Additionally, Hamdi et al. [56] discussed semantic adversarial diagnostic attacks (SADA) that are likely to occur, such as a change in camera viewpoint, lighting conditions or other aspects. Semantic attacks are difficult to understand, diagnose, analyze, and study as investigating real-world semantic attacks is not an easy task. Moreover, generating the parameters of the semantic condition is complicated. Hamdi et al. proposed an algorithm and a general setup for the adversarial attack. This algorithm was designed to learn the underlying distribution of semantic adversarial attacks. The proposed general setup includes an entity called the adversary. The interaction between the agent and the adversary takes place through the environment. The adversary tries to give an input to the environment such that the agent will fail in that environment. Then, the adversary receives a score from the agent so that it can update itself and increase attack rates in the future. This attack can be generated on the dataset of images (pixel) or semantic parameters. The setup Hamdi et al. created consisted of object detection, self-driving and unmanned aerial vehicle racing. They used a YOLOv3 object detector as the agent of their SADA framework. SADA could be used as an attack scheme as well as a diagnostic tool to assess the systematic failure of agents. However, this attack focuses on 2D images within neural networks.

White-box attacks

In a white-box attack, the adversary targets aspects of the DNN such as the structure, parameters and gradient descent. The adversary in this type of attack can obtain all the information needed to build an adversarial attack able to fool the target system [40, 42]. Examples of these attacks are FGSM [14], PGD [7, 61], Carlini, and Wagner (CW) attacks [9] and Jacobian-based saliency map attacks [46].

One of the most famous classical strategies in white-box attacks is FGSM. In this type of attack [14], the adversary computes an adversarial image by adding a pixel perturbation of the magnitude in the direction of the gradient. This computation is done as shown in Eq. 2: the model takes the gradient of the loss with respect to the input image x. Then, it finds the sign and adds an ϵ in the direction of that sign to the input image and generates an adversarial image. The goal is to add a perturbation that increases the loss. The attack then requires one update to achieve an untargeted attack (an adversarial image).

$${x}_{\textrm{adv}}=x+\varepsilon .\operatorname{sign}\left(\nabla {\;}_x Lx,{y}_{\textrm{true}}\right)$$
(2)

In a targeted attack, the procedure is the same. However, the ϵ is added in the negative direction of the gradient of the input image x. It is also assumed here that the gradient is computed with respect to the target label ytarget, as shown in Eq. 3. Hence, the aim here is to reduce the loss to the target label.

$${x}_{\textrm{adv}}=x-\varepsilon .\operatorname{sign}\left(\nabla {\;}_x Lx,{y}_{\textrm{target}}\right)$$
(3)

The parameters in Eqs. 2 and 3 are as follows: x is the clean input image, xadv is the corresponding adversarial image, L is model loss, ytrue is the actual label, ytarget is the target label and ϵ is in L budget. FGSM is a single-step method that is efficient in terms of computation time to implement [62]. There is also an iterative version of FGSM [63].

The previously mentioned attacks are performed within the Lp norm and are independent for each sample. However, new attacks aim at finding one perturbation (universal) for all samples of the targeted DNN model [23, 64, 65]. For more information about universal adversarial attacks, see [66,67,68].

Other strategies exist for performing white-box attacks that do not use the Lp norm to create adversarial attacks. One of these strategies is spatially transformed adversarial attacks [69, 70]. The goal of this type of attack [69] is to find the flow in which the pixels are moved in a certain quantity and perform a bilinear interpolation to ensure that flow does not receive positions that lie between two pixel locations, thus creating an adversarial image.

Pei et al. [71] proposed a White-box tool for DNNs in safety-critical applications, especially for corner situation behavior. This tool is named DeepXplore, and it relies on the assumption that there are at least two classifiers with the same task but with different dataset training and parameters. Pei et al. introduced a new metric for measuring the number of neurons activated (that meet the DNN classifier rules) by the test input. The proposed approach has two stages: maximizing differential behavior and neuron coverage. The first stage is aimed at obtaining test input that can cause two DNN classifiers to classify the input as different labels. This is achieved by solving the two DNNs’ joint optimization problem, which means finding a test input that lies between these DNNs’ decision boundaries. The second stage is maximizing the neuron coverage to reach this test input, which is achieved by increasing the number of activated neurons to obtain the test input. Before the application of DeepXplore, the DNNs classified the image with the same label; after the application of DeepXplore, they produced different labels. The framework created by Pei et al. can be utilized to systematically test a real-scenario DL system. However, generating the test input is not an easy task, especially if the DNNs have the same decision boundaries. Moreover, DeepXplore requires that the compared DNNs have the same task. In addition, if three DNNs are compared, DeepXplore guarantees that at least one will be different in label classification. This approach is not effective if the defense algorithm employs majority voting for the final classification score for at least three classifiers.

Likewise, Eykholt et al. [25] proposed a Robust physical Perturbations (RP2) attack algorithm. This algorithm produces perturbations for various physical dynamic environmental conditions encountered by autonomous vehicles. Therefore, the algorithm has been applied to road signs. It has two stages. First, the algorithm identifies the weak area in the image, taking into account various factors such as the image background, camera distance from the image, and camera viewing angle. Then, the algorithm produces a perturbation mask that is printed on white and black stickers. Finally, these stickers are attached to the physical target image in a specific location inconspicuous to the human eye. Their algorithm provides a standardized methodology to evaluate physical adversarial attacks. However, Eykholt et al. did not study the effects of the lighting on the image in the attack success rate. Finally, we summarize the presented studies on adversarial attacks, as shown in Fig. 8. First, based on the adversary’s means, we categorize the attack under consideration as physical or digital. Then, based on the adversary’s knowledge, we further classify the attack as blackbox or whitebox.

Fig. 8
figure 8

Taxonomy of adversarial attack studies

Defenses taxonomy

This section presents the taxonomy of DNN defenses, namely, defense strategies, DNN defense techniques, and DNN detection techniques.

Defenses strategy

Traditional DNN security evaluation methods primarily focus on the accuracy of DNN model classification and fail to evaluate model security and reliability. To address this issue, various recent studies have proposed a number of defenses for increasing the security and robustness of DNN models [15, 16, 72,73,74]. To evaluate DNN security, two main concepts describe DNN resistance to adversarial attacks:

  1. 1

    The first concept is DNN model robustness. This means the DNN model has knowledge of the minimum perturbation that will drive image x to adversarial attack image x̄ under this model.

  2. 2

    The second property is adversarial risk, which refers to the loss function (gradient descent) of the DNN model. In the DNN learning process, the model attempts to increase its prediction score by minimizing error with respect to the input image. Therefore, to create an adversarial image, the adversary attempts to maximize the loss function. This means finding the point in the neighborhood boundaries of x that can fool the DNN model [7].

We can categorize DNN defenses based on their goals when developed against adversarial attacks into two main models:

  1. 1

    Robust models F(., θ) can correctly classify adversarial attacks [75,76,77,78].

  2. 2

    Robust detection models D(., θ) can detect adversarial attacks [7].

Next, we present the DNN defense models against adversarial attacks in Fig. 9.

Fig. 9
figure 9

Adversarial defense model

DNN defense techniques

Based on the existing research, defenses classified under the first goal can be categorized into five types:

Gradient obfuscation (masking)

As we discuss in “DNN detection techniques” section, white-box attacks attempt to maximize the gradient descent. To counter this approach, the defender can mask the gradient descent in the neural network [75, 79, 80]. This can be accomplished by using a zero gradient, fuzzy gradient or non-existent gradient. However, this type of defense is not effective against black-box attacks.

Hakim [17] has proposed a technique for high-level security. The proposed technique consists of three stages: an ally patch extractor, a CNN evaluator and a final labeling decision. In the first stage, the input image is divided into equally sized ally patch candidates. Then, the generated patch candidates are filtered out using two constraints: minimum text information and the non-redundant patch. The final collective set is fed into the CNN evaluator. In the CNN evaluator, each patch is fed into a separate CNN model where the models are trained on the same task and produce the same label in cases of similarity. There are three scenarios for adversarial patches. In the first scenario, the input image has a large amount of text information and does not pass the filter process in the ally patch extractor stage. In the second scenario, the patch is partial to the intended adversarial patch and is not classified as the adversary target class. In the third scenario, the adversarial patches are assigned to the desired adversary class. In the final labeling decision stage, one of the following mechanisms is performed to make a decision: majority voting, total confidence, weighted average confidence and spanning measure. An ally patch works well on clean or adversarial images. In addition, Hakim determined that the attack success rate was reduced by one-third. However, each of the four strategies used in fusion to produce the final result had nearly the same effect. Moreover, this step is highly time and resource consuming.

Likewise, Wu et al. [15] proposed a defense against 5G-based adversarial attacks on autonomous vehicles. The autonomous camera captures the road image and sends it to Mobile Edge Computing (MEC) using 5G. Then, using Singular Value Decomposition (SVD), certain areas in the captured image are filtered out to eliminate the perturbation, if any. Finally, based on majority voting, the result is returned to the autonomous vehicle for the correct action. Based on this experiment, the tail and the middle area of the image are the most effective areas. In addition, the proposed method is effective against poster-printing, sticker, CW, Deepfool and I-FSGM adversarial attacks. However, the proposed defense’s accuracy is affected if the communication switches to 3G or 4G due to signal interference. Moreover, there is no backup solution should 5G or MEC go down. Finally, when this method was applied, the accuracy of the DNN model on normal images was reduced by 1.75%.

Adversarial training

Adversarial training [7, 76] entails training models with adversarial attacks generated by specific attacks; however, it cannot prevent new attacks. Moreover, it increases network capacity and consumes time, and the model accuracy with a clean dataset may be decreased [81, 82]. AbouKhamis et al. [83] applied the min-max algorithm to the DNN model to investigate its robustness against different adversarial attacks. The min-max algorithm has two parts: the first part aims to obtain an adversarial image from the original image at a high loss gradient, and the second part aims to minimize adversarial loss and increase DNN robustness. However, the researchers used adversarial training as a defense method, which causes a decrease in the classifier accuracy.

Certified defense

Certified defenses [84] check model robustness against attacks. These defenses determine how many samples cannot be attacked in the DNN model. This is done by defining a security parameter ϵ that must be less than the used L p norm. Then, an ϵ bounded ball with a radius able to resist identified Lp perturbations is determined around each pixel [52].

Denoiser

The goal of this defense [78, 82, 85] is to remove noise from the adversarial image. The authors of [82] have proposed a denoiser that attempts to minimize the loss function between the output of the original image and the output of the adversarial image.

Sutanto and Lee [78] proposed a defense technique against adversarial attacks based on deep image prior (DIP). This algorithm works by eliminating noise from the adversarial image. The goal of this defense is to construct a noiseless image after various iterations. With each iteration, the parameter is updated in the DIP loss function. These researchers provided a comparison between two images to prove the algorithm’s effectiveness. The first image is the original (clean) image minus the adversarial image. The second image is the denoised image (after applying DIP) minus the original. Sutanto and Lee found that the second image had no adversarial image pattern. The experiment results show the effectiveness of DIP against FGSM; however, the number of iterations needed for the denoised image was not discussed. Moreover, the time needed was not discussed, so its potential for effectiveness in critical-safety applications cannot be determined from this study. In addition, the CNN accuracy with the original dataset before applying DIP was 95%, while the accuracy after applying DIP was 90%. This implies that DIP decreases the algorithm’s accuracy with clean images.

Hu et al. [86] proposed a denoising process combined with a chaotic encryption defense for adversarial attacks. Their approach works in three stages. First, the input image is encrypted using a discretized baker map. Then, the encrypted input image is passed into a U-net denoiser classifier. Finally, the denoised input image is decrypted. The proposed approach is easy to implement and suitable for high-resolution images. This approach was applied with encryption and without decryption. The denoised input without encryption was effective against FGSM but failed against PGD attacks. The denoised input with encryption approach was effective against PGD attacks, but the model’s accuracy against FGSM and with clean images was reduced. In addition, the proposed approach only works on square image classifiers.

Preprocessing

Preprocessing-based methods include image transformations [77, 87], generative adversarial networks (GANs) [88,89,90], noise layers [91, 92], denoising auto-encoders [93, 94], and dimensionality reduction [95,96,97]. The goal of these methods is to perform various transformations on the adversarial image to remove adversarial noise and send the preprocessed image to the target model. The previous studies were evaluated on a small subset of images [98].

Qiu et al. [99] proposed a preprocessing function that is performed on the input sample to remove any adversarial noise before it is fed to the DNN classifier. The proposed approach consists of two major steps. In step one, Discrete Cosine Transform (DCT) is used to transfer the pixels into the frequency coefficients space. Then, these frequency coefficients are quantized with the novel quantization technique. Finally, the result of the previous step is de-quantized, and inverse-DCT is used to transform the pixels back to the spatial space. Step two is used to improve the image distortion as a preprocessing function. This step is novel as it provides a random variance without any influence on the classifier’s performance between the clean and the transformed image. The proposed preprocessing step will drop a predefined ratio of image pixels and modify a large amount of pixel coordination. This function provides three security requirements: usability, defensive quantization and approximation difficulty. The proposed approach outperformed comparable defenses such as FD [97], Rand [77], SHTELD [100], TV [87], JPEG [87], BdR [101], and PD [102].

However, the defense is specifically aimed at gradient adversarial attacks.

DNN detection techniques

Defenses classified under this second goal check whether the image is clean or adversarial before feeding the image to the DNN. If the input image is adversarial, it is rejected and does not pass to the DNN classifier. The research focused on this goal is [16, 18, 80]. These approaches identify features that are satisfied by natural (real) images and are not satisfied by adversarial (faked) images. However, this technique is not effective against white-box attacks where the adversary has knowledge about the identified features.

Feature squeezing is an important metric in adversarial image detection. It works by reducing the search space available for an adversary and detecting adversarial images. This can be done by applying transformations such as bit depth and spatial smoothing. These transformation techniques do not change the semantics of real images. The detection model performs two predictions. The first prediction is performed on the input image without any transformation, and the second prediction is performed on the input image after the transformation is applied. Then, the model calculates probability based on the results of the two predictions and compares it with a specific threshold. If the input image is clean, the two predictions, before and after transformation, will be the same [16].

Li and Velipasalar [18] have proposed a novel weighted average precision (wAP) frame distance metric to detect adversarial objects in autonomous vehicles. The proposed approach has two stages. The first stage is the frame distance metric algorithm, which calculates the differences between the results of two detected object images. Then, based on the frame distance result, the temporal detection score is calculated to determine if this image is adversarial or not. The proposed algorithm focuses on a single frame. Moreover, the proposed algorithm performed better than the existing single-frame detection method. However, experimental results show that the wAP outperformed mean average precision (mAP) in white-box attacks, but wAP and mAP yielded nearly the same results for black-box attacks (Gaussian noise and brightness). In addition, the proposed algorithm cannot be applied to images.

Likewise, Xiao et al. [103] proposed AdvIT, an adversarial detection method for video frames in autonomous vehicles. AdvIT takes the target video frame xt and reconstructs the optical flows between the xt frame and its previous frames. AdvIT then generates pseudo frames by transforming the estimated flows with small randomness. AdvIT checks the consistency between frame xt and the pseudo frames. If the compared frames are consistent, the target frame is clean. AdvIT is the first detection method based on video frames’ temporal consistency. In addition, AdvIT performance is not time-consuming. Moreover, AdvIT showed its effectiveness in three video tasks: semantics segmentation, human pose estimation and object detection. However, Xiao et al. compared their method with JPEG. Both methods achieved the same detection rate in semantic segmentation and human pose estimation if k = 1, where k was the number of the previous video frames. Moreover, this approach can only detect adversarial attacks and does not provide any defense mechanisms.

Next, we summarize the presented adversarial defense studies in Fig. 10.

Fig. 10
figure 10

Adversarial attack defense studies

DNN evaluation framework

Traditional DNN security evaluation methods primarily focus on the accuracy of DNN model classification and fail to evaluate the security and reliability of such models [8, 9, 20]. One way to evaluate the behavior of DNN classifiers is to use behavioral testing to validate the input of the model and the output behavior by performing various tests on system capabilities [30, 54, 104]. Behavioral testing is done without any knowledge about the system’s internal structure. The basic idea is to test whether the model behaves in the correct way in various conditions [105]. In traditional program testing, it is preferable to generate more cases to cover all possible cases and detect a code error if it exists. Following the same principle, a systematic method that can generate test input capable of detecting the unexpected/error behavior in DNNs must be established. This problem has been noted by many researchers [20, 30, 54, 106]. It is difficult to generate test data that are representative of the large input data space (dataset), in which various criteria, such as realism and diversity, are met. Moreover, a suitable oracle is needed to explore the entire input-output data space. Oracle effectiveness entails making DNNs generate the correct behavior or problem when the data are fed to the input test. In this technique, it is also challenging for complex domains, such as autonomous vehicles, to identify the correct behavior for every test input [20, 30]. However, oracle can be identified as the relationship between the expected behavior and a certain type of test input. According to Riccio et al., 12 studies have been conducted to trace the oracle issue in machine learning [20].

Like traditional programs, DNN classifiers need to be tested and verified before deployment to the real environment. Two important issues need to be considered in DNN testing: testing criteria, such as neuron coverage [71] and testing strategies, such as coverage-guided testing [30].

Tian et al. [30] proposed a systematic methodology called DeepTest to evaluate DNN classifier behavior in autonomous vehicles. Their method consists of various steps. First, the input-output pair space of the DNN logic is explored through the application of neuron coverage. Second, various image pixel transformations and affine transformations are performed. Third, neuron coverage is increased through the combination of different types of transformation based on the guided greedy search algorithm. Last, a metamorphic to correlate each input to the same output and automatically identify erroneous behavior in the DNN is identified. DeepTest generates test samples that mimic real environmental changes in driving constraints, such as rain or lighting. In addition, this approach focuses on generating test samples for corner cases to detect DNN misbehavior. However, DeepTest cannot guarantee that it will generate a synthetic image that covers all real cases. In addition, DeepTest was designed to test only the steering angle actions taken by autonomous vehicles.

Table 3 present a summary of the existing DNN behavioral tests, attacks, and defenses, including their advantages and limitations.

Table 3 Comparison of the existing models of adversarial attacks, defenses, and test frameworks

Discussion and future research directions

The previous subsections discussed various black-box and white-box attacks on autonomous vehicles, different solutions currently proposed for adversarial attack techniques and various studies on DNN behavioral tests. The safety of autonomous vehicles relies on two important components: DNN classifiers and the sensors that feed these classifiers with captured images. This section first discusses the observations of the previously mentioned information and summarizes the challenges (see Fig. 11). Then, it provides possible suggested future research directions (see Fig. 12) to make autonomous vehicles safer.

Fig. 11
figure 11

The main challenges

Fig. 12
figure 12

Future directions

Discussion

Trained DNN model errors

Autonomous cars rely on a DNN model perception system to detect objects and drive on their own (without a human driver). However, DNN algorithms have vulnerabilities, bugs or errors, which may lead the perception system in autonomous cars to misclassify objects and ca used car accidents [3, 4]. One example of such an autonomous car accident involved Uber [107, 108] when an autonomous car that misclassified a pedestrian as a wrong object and failed to prevent a collision. In a safety-critical applications it is important to make sure the trained DNN model for object detection is bug-free and robust.

Autonomous vehicle sensors

An autonomous vehicle without sensors is blind, like a human without eyes. Attacks on autonomous vehicles can be generated against any sensor. The most important sensors in self-driving vehicles are lidar, radar, and camera sensors. Researchers have shown through experiments on SADAs that self-driving vehicle camera sensors are vulnerable to small perturbations in the camera position or camera view angle [56].

Adversarial attack

DNN security and robustness in autonomous vehicles represent one of biggest challenges in this research area [109]. One of the most serious security issues in DNNs is the threat of adversarial attacks. Adversarial attack generation is not an easy task, whether it is a black-box attack or a white-box attack. If the DNN model architecture is complex and masks important model parameters that can be used to generate the attack, designing a white-box attack is difficult. In addition, developing a black-box attack is more difficult. This is because black-box attacks are generated through digital means, such as poising an attack, or physical means, such as modifying the environment with no idea about the classification architecture in the vehicle. Moreover, each defense presented is designed to break a specific attack.

DNN training privacy

We have presented various studies that test DNN classifiers and evaluate their expected behavior; however, there are some limitations, such as scalability and diversity. Researchers urgently need to find a systematic methodology for testing DNNs in safety-critical applications. The dataset used to test DNN classifiers is an important metric in exploring various unexpected behaviors of DNN classifiers. This can be done by generating input test samples that cover the large input space of deep learning models. This is a difficult goal to accomplish because these samples must meet different criteria such as increasing neuron coverage, diversity and realism [20]. Test input samples can be generated through various methods, such as an adversarial attack. However, adversarial samples will only cover a small subset of the features learned by DNNs as these adversarial samples were not generated to maximize neuron coverage [71].

Trade-off between adversarial accuracy and standard accuracy

Defenses built on adversarial training need to take this topic into consideration. This is because DNN models perform excellently on training data with a high accuracy [78]. After adversarial attacks are generated for the model and the model is retrained on these data, its accuracy on clean images is reduced, while its robustness against attacks is enhanced. Researchers are beginning to investigate this topic [110, 111], but we more work needs to be done on this type of defense to balance the model accuracy.

Future directions

DNN security and robustness

DNN classifiers have to resist any modification that can result in an object or image being misclassified. Therefore, there is a need to develop a systematic approach or test tool to evaluate DNN robustness [25]. One of the ways to do this is to test DNN classifiers against unknown attack scenarios. Likewise, research must be conducted to develop a general technique that can defend against existing attack methods. In addition, the DNN classifier must strongly and accurately detect the object in the real time environment, whether a fake object has been added or a real object has been deleted or modified. Moreover, this defense should be applicable to various types of attacks. For example, adversarial training methods are robust against the attack for which they have been generated. Moreover, this defense must be secure against black-box attacks. It must also have countermeasures in case the DNN comes under attack by the following:

  • Isolating the DNN classifier from decisions and waiting for an action from a central authority.

  • Updating the system through 5G to double-check final decisions.

  • Remotely controlling the vehicle through a central authority.

Toward certified epsilon robust defense

The development of various tests is needed to test DNN classifier robustness in autonomous vehicles. In theory, this can be done if the DNN model in question has a bound area with respect to the input image. This means that any modification in the input image will result in the same correct classification [112].

Autonomous vehicles’ sensor enhancement

Increasing the number of sensors will provide better environment data and increased sensor redundancy, but it will also increase the cost of autonomous vehicles. To counter this system weakness, we suggest the following possibilities:

  1. 1

    DNN classifiers need to be trained with more quality data and various images that are captured in different positions and from different angles. This means there is a need to build a set of accurate and up-to-date images or videos depicting the autonomous vehicle environment. In the presented studies, researchers applied their algorithms on known datasets with limited classes and images. Thus, there is a need to create new datasets to train DNNs to various scenarios.

  2. 2

    The number of camera sensors in autonomous vehicles needs to be increased, and they need to be positioned to capture images from various angles. This will increase classifiers’ opportunities to make correct predictions. This can be done using multiple DNN classifiers. Each camera sensor will feed an image to a classifier. Then, each classifier will produce a prediction for that image. Finally, based on the final stage components, such as majority voting or weighted average confidence, the final prediction score can be obtained.

  3. 3

    Lidar sensors can capture objects in 3D point clouds and feed them to DNNs for classification. 3D point cloud models suffer from adversarial attacks as presented in this work [113]. However, these attacks were later shown to be easily rebuffed using simple strategies such as random sampling and denoiser [114]. Moreover, 3D neural network adversarial attacks do not transfer to other unseen 3D networks [114, 115]. Newer studies [43, 115] have shown that 3D neural network adversarial attacks do transfer to other unseen 3D networks with a success rate of 40% and can break existing defenses [115]. Little research has focused on 3D neural network attacks and defenses [43, 114, 115]. Researchers need to move forward to develop attacks and defenses to increase 3D DNN robustness and reliability.

DNN test data generation

This highlights the need to design a good metamorphic mutation strategy including image distance metric and image transformation to generate samples that maintain semantics as close to those of the original sample as possible. The generated samples should be as realistic as possible to mimic various real-world scenarios. This method aims to generate unseen samples for models that help them to analyze and explore different cases. In addition, producing large test samples requires huge manual labeling efforts. To solve this issue, researchers have begun using a metamorphic oracle to facilitate mapping a group of test inputs to the correct behavior (label) and measure whether the test input meets the expected behavior [30, 54].

Conclusions

Deep neural networks (DNNs) are rapidly emerging as a means for classifying images and objects with high accuracy rates. DNNs serve as the foundation for many useful applications, such as facial detection and recognition systems and the safety-critical applications that are of supreme importance for the successful operation of autonomous vehicles. Indeed, one of the most eagerly awaited widespread applica tion domains promised by DNN is that of autonomous vehicles.

Given our growing reliance on DNNs, concerns have been raised regarding their security and reliability. In this survey, we have presented the state-of-the-art research on DNN behavioral tests, adversarial attacks and defenses and have discussed each work with its advantages and limitations. Moreover, we have presented our thoughts on DNN behavioral test adversarial attacks and defenses and have recommended a future direction for this field of study.

This paper concludes that research needs to be carried out regarding general adversarial attacks to develop defenses that are robust against various types of attacks. In addition, a defense that offers a balance between the standard accuracy of DNN models before and after training on adversarial attack samples must be developed. Moreover, researchers must focus on developing models that offer certified robust defenses. Also, research must be performed on increasing the number of autonomous vehicle sensors to provide better environmental data and increased sensor redundancy. Finally, researchers must develop a systematic methodology for evaluating DNNs in autonomous vehicles.