Dual-filtering (DF) schemes for learning systems to prevent adversarial attacks

Defenses against adversarial attacks are essential to ensure the reliability of machine-learning models as their applications are expanding in different domains. Existing ML defense techniques have several limitations in practical use. We proposed a trustworthy framework that employs an adaptive strategy to inspect both inputs and decisions. In particular, data streams are examined by a series of diverse filters before sending to the learning system and then crossed checked its output through anomaly (outlier) detectors before making the final decision. Experimental results (using benchmark data-sets) demonstrated that our dual-filtering strategy could mitigate adaptive or advanced adversarial manipulations for wide-range of ML attacks with higher accuracy. Moreover, the output decision boundary inspection with a classification technique automatically affirms the reliability and increases the trustworthiness of any ML-based decision support system. Unlike other defense techniques, our dual-filtering strategy does not require adversarial sample generation and updating the decision boundary for detection, makes the ML defense robust to adaptive attacks.


Introduction
Adversarial attacks (AA) manipulate input data by adding traits/noises in various trickier ways and such AAs to deep learning models reduce trustworthiness of their use. It is to be noted that the rationale behind AA's success are inconclusive and do not provide clear explanation for real-world applications. Szegedy et al. [78] stated the reason for attack success is non-linearity of ML models; on the other hand, Goodfellow et al. [34] argued that AAs take advantage of linearity in some ML models. Another theory [80] proposed a tilted boundary theory to explain that it is never feasible to fit a model completely, and that is why AAs exist. Some MIT researchers stated that all adversarial features are not additive noise, rather these data cannot be properly classified because human sensors are not sophisticated enough to associate a class for these data, however this argument is disputed by other researchers [41]. To build a robust ML/AI-based system against malicious adversaries, we designed a dual-filtering scheme, (which employs end-to-end defense mechanism) one at the input stage (before samples are fed to the core learning model) and other at the output of ML (before the decision component). These two filter sets can function independently as well as dependently (i.e., in a commutative fashion). Specifically, the input filtering layer's main aim is to drop misleading and out of distribution inputs (e.g., image of animal but not a human face in a face recognition system). The output filtering layer's goal is to handle larger variations and restricting mis-classification to improve overall accuracy of the learning system. The proposed dual-filtering strategy can be used both in training and testing phases of ML-based systems. For instance, the independent input filters may be used to detect and deter the poising attacks in a supervised ML. Likewise, dual-filtering strategy helps in addressing adversaries both in supervised and unsupervised ML. A machine learning (ML) framework usually consists of four main modules: feature extraction, feature selection (optional), classification/clustering, and decision. As depicted in Fig. 1, input filters are placed after pre-processing of data stream/feature selection to feed to the learning model and the output filters are placed after classification/clustering/raw decision module, respectively.
As can been seen in the Fig. 1, the raw input sample is first pre-processed and then fed to the input filter to determine if the received feature/sample is either clean or noisy/adversarial, and accept or reject accordingly. The outcome by ML module is given to the output filter for further scrutiny. The output filter uses context-information and/or communicates with the input filter bank to make the correct final decision. An ensemble of different noise removal or AA detection filters was successfully applied in a recent work [29]. Other techniques focused mostly on adding extra layer on a ML module by adversarial sample training or modification of deep learning models; these defense methods have some constraints, and exposed ML models to new vulnerabilities [36].
In 2019, some works reported launching adaptive attacks where they could bypass known defenses [17]. To alleviate the situation, we consider a non-deterministic (white-box) approach where the attackers cannot perceive our defenses to launch adaptive attacks. Accordingly, we investigated an active learning [73] based dual-validation scheme which work as an extra security (filtering) layer and improve the learning model's trustworthiness.
Accordingly, our defensive measures for machine learning model (MLM) have the following tasks: -Input filter before MLM: The primary purpose of input filters is to prevent adversarial input data in such a way that can differentiate data manipulation from the trained data. It will be examining the input by deploy-ing application-specific filter sequence. A set of filter sequences are selected (from a given library of filters) using an efficient search and optimization algorithm, called multi-objective genetic algorithm (MOGA). The MOGA can find a sequence of filters (where each filter can detect adversarial traits/noises) satisfying constrains and three objectives: detection of the maximum number of attacks with higher accuracy (above a specific threshold), with minimum processing time, and shorter sequence of ensemble filters. By utilizing the Paretoset from MOGA runs, and picking a filter sequence dynamically at different times, make filter selections unpredictable and use an active learning approach to protect the ML from adaptive attacks. -Output filter after MLM: Employ several class-specific latent space-based transformation for outlier detection.
After MLM provides an output class label, it is then verified if the output falls in that class's latent space or not. We will make an ensemble of different outlier detection methods and sequence dynamically and also retrain the outlier methods during runtime.
The rest of the paper is organized as follows. "Preliminaries" and "Defense objectives" provides literature review and highlight research challenges. In "Our proposed methodology" and "Experiments", we described our approach with experimental results and analysis. In the following section, we reported advantages and limitations of our defense technique. Next, we gave conclusion with prospects of our future work.

Preliminaries
In this section, we detailed the adversarial properties we studied for our defense techniques and highlighted related works.

Adversarial machine learning (AML) attacks
Based on NIST [79] definition, AML is the manipulation of training data, ML model architecture, or manipulate testing data in a way that will result in wrong output from ML.
Generally speaking, adversarial examples are input data which get miss-classified by an AI method but not by a human eye. In mathematical definition:

Adversarial defense
Goodfellow et al. [34] tried to training on adversarial inputs pro-actively, Papernot et al. [66] performed defensive distillation and Miyato et al. [57] training the network with enhanced training data all to create a protection against adversarial example. Grosse et al. [35] did statistical tests using a complementary approach to identify specific inputs, that are adversarial. Wong et al. showed convex outer adversarial polytope can be a proven defense [92]. Lu et al. [52] checked whether the depth map is consistent or not (only for image) to detect adversarial examples. Metzen et al. implemented deep neural networks with a small "detector" sub-network were trained on the binary classification task of distinguishing factual data from data containing adversarial perturbations [56]. The same year, Madry et al. [55] published a paper on adversarial robustness of neural networks through the lens of robust optimization. Chen [82] and Xu et al. [95] simply reduced the feature space to protect against adversary. Monteiro et al [59] developed inputfiler which is based on bi-model decision mismatch of image. Sumanth Dathathri showed whether prediction behavior is consistent with a set of fingerprints (a data set of NN) named NFP method [31]. Same year, Crecchi et al. used non-linear dimensionality reduction and density estimation techniques [27] and Aigrain et al. tried to use confidence value in CNN [3]. Some other notable works in that year were meta-learning based robust detection method to detect new AAs with limited examples developed by Ma et al. [54]. Another important and effective work was done by Chen et al., where they tried to keep the records of query and used KNN to co-relate that with adversarial examples [25]. In the Table 1, we summarizes the adversarial defenses.

Nature of adversarial attacks
From our extensive literature review and empirical examination, we observed five basic adversarial nature. These are:

Advanced AAs are ineffective in the physical environment
Advanced adversarial methods are the method which adds fewer noises/perturbs than other methods. In 2017, [47] showed that in the digital version and printed version success of adversarial methods effectiveness decline. They tried to justify their argument with FGSM, BIM, and other iterative methods. [53,61] experimented with FGSM, BIM, and LBFGS methods and showed the destruction rate up of 100% based on distances invalidating these attacks.

Clean and adversarial inputs have identifiable noise difference
Researchers [5,37,68] demonstrate adversarial and clean images have a comparable difference in their noise value which are identifiable for attacks such as FGSM, BIM etc. It was also illustrated that normal filtering technique highlighted the noise part after pixel difference method [37], and these noises could be detected using other metrics such as the histogram average/ local binary pattern, signal to noise ratio (SNR) and etc.  [29] Defense technique Approach/scheme Adversarial training Ensemble adversarial training, a training methodology that incorporates perturbed inputs transferred from other pre-trained models [86] Extended adversarial and virtual adversarial training as a means of regularizing a text classifier by stabilizing the classification function [58] Training the state-of-the-art speech emotion recognition on the mixture of clean and adversarial examples to help regularization [21] Defensive distillation The main idea used is training the model twice, initially using the one-hot ground truth labels but ultimately using the initial model's probability as outputs to enhance robustness [65,75] Pre-processing defense Using PCA, low-pass filtering, JPEG compression, soft thresholding techniques as pre-processing technique to improve robustness [74] Use of use two randomisation operations: (1) random resizing of input images and (2) random padding with zeros around the input images [94] Architecture alteration Synonyms encoding method that inserts an encoder before the input layer of the model and then trains the model to eliminate adversarial perturbations [90] An architecture using Bayesian classifiers (Gaussian processes with RBF kernels) to build more robust neural networks [12] Network verification A verification algorithm for DNNs with ReLU function was proposed in [43] verified the neural networks utilizing satisfiability modulo theory (SMT) solver The method in [43] was modified in max(x, y)

Ensembling countermeasures
The proposed strategy used an ensemble of classifiers with weighted/unweighted average of their prediction to increase robustness against attacks [76] A probabilistic ensemble framework against adversarial examples that capitalizes on intrinsic depth properties (e.g., probability divergence) of DNNs [1] Adversarial detection First, the features are squeezed either by decreasing each pixel's color bit depth or smoothing the sample using a spatial filter. Then, a binary classifier that uses as features the predictions of a target model before and after squeezing of the input sample [95] A framework that utilizes ten nonintrusive image quality features to distinguish between legitimate and AA samples [4] Multiversion programming based an audio AE detection approach, which utilizes multiple off-the-shelf automatic speech recognition systems to determine whether an audio input is an AE [97]  Here we only provided successful detection rate

Same filtering technique will work for all ML model for a specific dataset
Filters can detect AAs in data preprocess stage [5,37,68]. That means this technique will work for the black-box model, which means defense is not required to access or modify the ML. So, if the ML changes, for example, from Resnet to VGG or SVM to a Random Forest, the defense technique needs no changes. It will be completely independent of the ML changes.

Different filters have different effectiveness to detect AAs
We have experimented with different filters, as presented in Table 2. Here, we can see that noise addition and canceling filters are working better in the gradient-based attack, and texture-based filters are working better for boundary-based attack types. For example, FGSM and BIM are both gradientbased attacks, and we find out blur works against both of these attacks. This result is expected as AA noises have a distinct nature related to the attack method. This phenomenon proves that picking one filter from each filter family will have more effectiveness than selecting all the filters from the same filter family class. In this experiment, We generated FGSM [34], BIM [55], PGD, JSMA [91] using Pytorch [33], IBM-ART-Toolbox [62] and Cleverhans adversarial library [64]. We noticed that the destruction rate (i.e., the rate of failure of AA when it is converted to physical form) [47] is presented in some attack samples. We disregarded those from attack samples. Also, due to our restriction of = 0.03 as maximum noise value, we had to discard some examples from our dataset for having higher noises.

Outlier detection methods can detect AAs as outliers
The work of Ruff et al. [71] shows that outlier detection methods can classify class label from outlier samples. In multi-class classification, each class separately generate their own latent space and outlier detected there as negative class and inliers are detected as positive class and able to achieve 95%+ accuracy for MNIST class classification. Similar approach we experimented with adversarial samples for single class classification. We took class label '0' as positive class or inlier, all other 9 classes and adversarial samples for class 0 are considered negative class or outlier. We trained with 1000 positive class. We used [101] developed outlier library in our experimentation. We tested with 500 positive data, and 500 adversarial sample (FGSM samples generated using [33,62]) of class label 0. The accuracy was presented in the Table 3. We can see that one class support vector machine and V-detector based negative selection algorithm does better than other.

Static defenses can by-passed by adaptive attacks
Carlini el al [17] exhibited an adaptive attack where the attacker can bypass the known defenses. So, if the defense is not changes or remain static it will be vulnerable to adaptive attacks. Also, more recent works showed that dynamic defense mechanism which claims effectiveness against adaptive attacks fails against gradient based adaptive attack [85]. Above natures of adversarial attacks helps us to conclude that filter-based techniques can detect noises and outlier detection method can distinguish between a adversarial and clean input but an adaptive attack can be designed to bypass these defense techniques.

Defense objectives
Researchers have evaluated several defense techniques [8,16,17,19,20,85] but these evaluations focused on how many different types of attacks a defense technique can defend specially prioritized effectiveness against adaptive attacks. However, many of these defense techniques have implementation issues, such as needing prior knowledge of the ML model and dataset. Some of these techniques require modifying the ML model layers or retrain the model. Retraining the model also can reduce the efficiency of the model. Some defense techniques have high computational costs and are not suitable for ML model's different domains. Yuan et al. [96] suggested making threat models consist of Adversarial Falsification (False negative, False Positive), white-box, BlackBox, targeted, non-targeted, onetime and iterative attacks. Carlini et al. [17], suggested that adversarial attack and defense models need to be tested against a diverse set of attacks. Also, they need to be evaluated against adaptive attacks. Moreover, Tramer et al. [85] suggested different themes to evaluate a defense model. Keeping these guidelines in mind, we developed our threat model inclusive of basic, advanced attack and adaptive attack (against our defenses) types. Carlini et al. [19] also recommended using at least one gradient-free and one hard-label attack. To address that concern, we evaluated our proposed method with gradient-free attacks such as local search attack [60] and hop-skip-jump attack [23]. For testing against an adaptive attack, we used BPDA (Backward Pass Differential Approximation [9]), which can be used to attack non-differential prepossessing-based defenses. Uesato el al. [87] advised to consider obscurity of adversarial attack when considering the defenses. [17] pointed out that testing a defense in one dataset is not enough, therefore we chose multiple datasets (i.e., MNIST, CIFAR-10, and ImageNet). We considered a standard distortion = 0.3 for MNIST and = 8/255 for CIFAR-10, as current state-of-the-art [85] recommended. Thus, our threat model is a combination of gradient-based, gradient-free, and adaptive evasion based adversarial attacks on multiple datasets. These attacks studied in this work are a combination of White-box, Black-Box, targeted and nontargeted attacks. Also, the presented defense will be able to defend against attacks that are completely unknown to the proposed defense scheme. Considering above researchers suggestions in mind, we aim to provide an adversarial defense system which will meet the below objectives: -Defense needs to work against a diverse set of attack types. Our provided defense technique should work against Gradient or no-gradient, white-box or black-box, targeted or no targeted, adaptive attacks [17].

Our proposed methodology
In the Fig. 2, we illustrated the basic concept of our proposed solution. As we know that it is possible to detect adversarial input noise using different filters, we will apply filters to detect noise. We need to know which filter we need and the difference between the clean and adversarial noise threshold. that is why we first use the information from the ML model to determine the input is an outlier for the class label the ML model is classified or not. If it is an outlier, we will send it to the adversarial dataset. If not, we will send it to the clean dataset and update the outlier methods decision boundary and determine the required filters and the noise thresholds. Before update/retrain the output and input learning model, we will inspect the data for adaptive attack patterns in the adaptive attack detection module. The Fig. 3 demonstrated our proposed dual inspection strategy. It is shown that the inspection before and after ML are independent and can be deployed as a plugin. As in active learning, when the clean dataset has some data, it will train the outlier detection techniques, and the 'inspection after ML' module will start to work. After the outlier finds some adversarial examples, the adversarial dataset receives some data. When the adversarial dataset has sufficient data, our multi-objective Genetic algorithm started the genetic search for filter sequences that are effective against the adversarial noises and the differentiating noise thresholds for these sequences. As time progresses, MOGA will detect more adversarial samples, and the knowledge of the outlier detection technique will transfer to noise detection techniques. This way ML model has to process fewer adversarial examples. We will select different filter sequences for each input and different outlier detection methods for each input to make the defense dynamic. After each (or a specific amount of input), outlier methods will retrain, and it will update the outlier detection decision boundary. Similarly, MOGA will update the filters library subsequently. This way, both outlier and filter-based defense technique will keep themselves updated as time progress. As this method can be vulnerable by adaptive attack, we will store the data and inspect for adaptive attack pattern before update the filters and outlier detection methods. We detailed that method on "Adaptiveness and dynamic selection".
The basic workflows from the Fig. 2: 1. Input will be sent for filters to extract different metrics (SNR, Histogram etc). There will be a dynamic selection of the filter set from the filter library. 2. Extracted filter metrics value will check for perturb, if it is above certain threshold switch S1 will open or other wise switch s2 and s3 will open. 3. S1 open: -input will be sent to adversarial dataset and the process will terminate. -Adversarial dataset will retrain the filter sequence search for noise detection and change the threshold value.

S3 and S2 open:
-If S3 open, extracted filter metrics value will be sent to outlier detection system.
-If S2 open, input data will be sent to ML model and Switch S5. 5. ML model will deliver the output class to S4 and outlier detection system. 6. Outlier detection system will randomly pick one outlier detection method. If it detected as outlier witch s1 will open, otherwise S4 and S5 will open. 7. S1 open: -input will be sent to adversarial dataset and the process will terminate. -Adversarial dataset will retrain the filter sequence search for noise detection and change the threshold value. Fig. 2 Illustration of basic flow concept for proposed dual inspection framework. If the input is not adversarial, the original input (not the processed) be sent to the learning model/ML and after ML produce class label, that labels latent space will be used in outlier method. The outlier decision boundary and the threshold of noise will change as the dataset of adversarial and clean data set are updated by each input

S4 and S5 open:
-S4 will provide the final output class and S5 will send the input to clean dataset which will trigger the retrain of outlier methods and change the outlier decision boundary.

Multi-objective genetic search for filters
We need multiple filter sequences because we cannot use the same sequence of filter for every input. A sequence could be any length. Search for optimal set of sequences require significant computational time, if we do exhaustive search, considering multiple objectives. that is why we will employ a multi-objective GA to search for the optimal set of sequences as pareto-front solutions. For search filters, we need to consider different factors besides their accuracy. Based on our objective, our filters need to be fast, that's why order of filters are important because different order of filters require different amounts of processing times. It is preferable to have our solutions time efficient. If we have N number of filters, then total possible number of sequences will be our search space. If we do not consider time efficiency, then we do not need to order in a combination of sequence (For different order a sequence accuracy remain static but time efficiency change). We can optimize our search space by limiting the minimum sequence length and maximum sequence length. So, our search space equation will be: Here, min and max are the minimum and maximum length of a filter sequence. Suppose, we have 17 filters and minimum length were 6, then our optimized experimental search space has consists of 9.66 14 search item. It justifies the necessity of using heuristics search method like GA. In summary, we need a time-efficient, to produce a reliable performance and a unique set of sequence. Our designed multi-objective GA can achieve all these criteria. The purpose of using a genetic search is to find a diverse sequence of filters detecting AAs with maximum accuracy while each filter is having distinctive characteristics and capabilities when deployed such a sequence adaptively (interchangeably) in a ML that will be unpredictable to attackers compared to a static ensemble of well-known filters. So, the GA will find not only the best filter ensemble, but also a set of diverse filter sequences in multi-objective Pareto-front.

Perturb range/threshold determination of filters
First, we need to process the clean dataset and run all the filters and calculate their metrics value and gather their Mean, Std Dev, Max. Using algorithm for each filter. Now, if Mean isμ, standard deviation is σ , Our lower range (L r ) and upper Fig. 3 Illustration of proposed dual inspection framework. If the input is not adversarial, the original input (not the processed) be sent to the learning model/ML and after ML produce class label, that labels latent space will be used in outlier method. Selection of outlier and filer sequence will be dynamic range(U r ) calculation denoted by Eqs. 2 and 3 Using Eqs. 2, 3 for 17 filters, we generated list of upper and lowerange

Encoding
First, we encoded all filters according to Table 2. Here, 17 algorithms were assigned sequence number F T 1, F T 2 . . . F T 17. These filters are our genes. We will create our individuals/chromosome using these genes. We generated the population by random sequence generation using the genes. So, each sequence is consists of different length of filters. We remove multiple occurrences of filters in a single sequence.
As example a random sequence F T 2F T 5F T 11F T 12 means Blur -Census -Morph -Canny. That way, we generated multiple lengths of sequences as our initial population.

Fitness function
We have three objectives. they are accuracy (α), time to detection (β) and insider diverseness (γ ) of filters in sequence. Accuracy is the success rate of detection by the filters. The filter sequence takes adversarial and clean samples from the dataset. Based on the filter sequence's metrics value range, we check how many adversarial samples we can detect and how many we falsely detected. We used F1 score as accuracy value.
For time, if each filter take t i time, then total time to detection δ t can be calculated by For insider diverseness, for each filer f i ∈ F i , here F is the filter family and f is the filter, S is the sequence.
We normalized all three objective data using equation 7 We inversed the Time data, so, we have to maximize all of the objectives. Our fitness function denoted by max(f(S)) = ((α n (S)), (β n (S)), (δ n (S))), where α n is normalized accuracy, β n is normalized inverse time, δ n is diverseness factors. We have penalty function to prioritize simpler solution and weight values to speed up the GA process. We observed that, in the beginning α is low and after a certain iteration γ gets lower. We use W 0 for α and W 1 for γ as weigh value.
For penalty functions we need below parameters Length of Best fitted individual in previous iteration | max f (S i ) ∈ ∀(S)| Size of current Sequence = |S| Total number of filters = n i=0 | f | ∈ ∀F Equation for penalty function value can be denoted by So, from Eq. 8, fitness for S is

Crossover, mutation and selection
We used an elitist strategy with rank selection [28] and kept the best performing filters for the next generation in steadystate genetic search. We used PMX crossover as the order of the filter sequence are important optimization criteria in a sequence [88]. In the Fig. 4, We illustrated the genetic search of near-optimal filter sequences where search terminated after specific iterations or if the fitness values do not improve for a long period i.e. threshold number of iterations.

One class classifications outlier method
There will be different latent spaces for each class to detect that class's outlier. In the Fig. 5, we can see that MNIST digits have their clusters for each class label, and these are well separable. In the Fig. 6, we can see filter-based metrics can very easily differentiate between adversarial and clean sample. We suggest using an ensemble of different outlier detection methods-for example, a combination of one-class SVM, isolation forest, and negative selection algorithm. Our experimental results shows that negative selection algorithm is random nonlinear learning system, which is applicable for adversarial detection and it randomness made is easy to make the system adaptive by regular updating the learning model. In the Table 4, we experimented the attack sample accuracy rate with v-detector generated using different number of clean samples. It is observed that after each iteration vdetector performance slightly increased. In the Table 5, we compare v-detector NSA results with other techniques; we can see that OCSVM and IF performs better than NSA for gradient-based attacks for low noise attacks NSA outperforms both of them. Variable Autoencoder did not perform well due to a low number of samples. SOGAL and MOGAL based techniques were also failed to work with low models.

Adaptiveness and dynamic selection
We randomly choose different filter sequence and other outlier methods to keep the system dynamic for each input. After each input, the outlier detection modules are updated by changing their decision boundaries, making the detection filters adaptive. However, the filter sequence can change, and the noise threshold value gets updated after each MOGA run. This makes common adaptive attacks ineffective as each input continuously updates the defense strategy. An adaptive attacker will first send random clean input. And started to add some noise in these inputs and send repeatedly until the classification result changes. That way adaptive attackers will know the decision boundary of the learning model. Then adaptive attacker will start creating input that is close to the decision boundary in the representation space. In our method, attackers have to bypass our dynamic and changing adversarial detection method, which decision boundary is not affected by the actual learning model. If our filter set or outlier detector method was static this method would work, but dynamic selection of filterset and outlier detection method will make it hard to formulate the adaptive attack. Additionally, after a certain set of inputs, we will regenerate negative detector sets by considering these new inputs as self data. So, entire outliers decision parameter would change and the adaptive attacker will not able to establish a fixed decision boundary line for adversarial and nonadversarial input data as adaptive attacker is looking for class classification boundary not adversarial and non-adversarial decision boundary. This update method can also be vulnerable to adaptive attacks which aim to bias the method accuracy, we added a adaptive attack detection module before update/retrain our adversarial detection techniques.
In the adaptive attack detection module, we will analyze distributions of last certain number of inputs are align with total distributions of inputs. We will use the Kolmogorov-Smirnov goodness of fit test (K-S test) compares data with a known distribution and lets us know if they have the same distribution. This test is nonparametric as it doesn't assume any particular underlying distribution [11]. The Kolmogorov-Smirnov test determines a null hypothesis, H 0 , that the two samples originate from the same distribution. Then we explore for evidence that this hypothesis should be rejected and formulate this as probability ρ. If the prospect of the samples being from different distributions tops a confidence level we reject the original hypothesis and accept hypothesis H 1 , which stated that the two samples are from different distributions. Based on the KS distribution table, if ρ < 1.22 √ n (where n = number of stored input) than the stored input has inputs from adaptive attack. We disregarded those samples as these may create data bias in our defense learning system.
In summary, our adaptive defense mechanism consists of the following properties -Dynamic selection of filter set sequence which will make it harder to formulate adaptive attack based on known filter knowledge. -Dynamic selection of outlier detection method, it will make the adaptive attack to consider all outlier detection method when developing attack input that will make generating input computationally expensive. -Defense is always learning which will continue changing the filter sequences and decision boundary of outlier detection models. It will make an adaptive attack difficult to search decision boundary. -To protect against continuous query-based attacks, we will monitor and analyze input trends using the K-S test. The number of inputs considered for the K-S test will be dynamic. Formulate a query-based attack on the defense system will be hard due to the randomness of the K-S test sample number. Our input trend detection system can effectively monitor adaptive attacks and able to take countermeasure.
Our defense properties will make the state of the art adaptive attack ineffective and it will make computationally harder to formulate new adaptive attacks.

Dataset generation
We did a comprehensive experiment with MNIST and CIFAR-10. We did extensive testing with the MNIST dataset for all the 10 classes and with the full dataset. We did CIFAR testing with two classes. After that, we evaluated our method using EMNIST, Fashion-MNIST, and IMAGENET data-set which re-validated our methodology. We generated FGSM, JSMA, and CW samples to test the results. We generated 100,000 FGSM samples using LENET-5. LeNet-5 LeNet-5 CNN architecture is made up of 7 layers. The layer composition consists of 3 convolutional layers, 2 subsampling layers and 2 fully connected layers. For JSMA we generated 100,000 JSMA samples using a CNN. CNN architecture is made up of 5 layers. The layer composition consists of 3 convolutional layers, 1 flatten and 1 dense layers. All of the activation functions are using RELU. and last we generated 100,000 CW samples using VGG-16 neural net.
To establish the ground truth for our research we used 30,000 clean image samples, 10,000 FGSM, 10,000 JSMA and 10,000 CW attack samples on MNIST dataset. For filtering operation we picked 14 filters using python opencv library. They are medianblur, GaussianBlur, AverageBlur, Bilateral blur, AdditivePoissonNoise, AdditiveGaussian Noise, Erode TopHat ,Blackhat, Morphology gradient, Opening, Closing, Dialte filter. We apply the filer in the image and than extracted difference between original image and the filtered image. After that we measure the average and  In the Fig. 7 'b' we visualized how adversarial (FGSM + JSMA + CW) inputs of one class label overlap with other class label compared with 'a' where only clean inputs where presented. This shows that adversarial samples are hard to distinguish between class labels. In the Fig. 8, we represented all inputs with their adversarial attack type along side the clean input. Here blues are the clean one. We can see here the FGSM which is visualized with red are not overlapping with clean one or other attack type much. But JSMA and CW are highly overlapping with each one and also partially with clean one. CW inputs are more overlapping with clean samples.
In the Fig. 9, we represented the adversarial and clean samples after applying 14 filters. In the Fig. 'a' of 9, we represented the SNR values of the images and it showed FGSM (blue) samples are very easily separable but JSMA (red) and CW(yellow) are hard to separate using SNR values only. However, JSMA are more separable but CW and clean samples are completely overlapping. In the Fig. 'b' of 9, we illustrated using histogram value and it made CW more separable than the clean ones. In the Fig. 10, we applied both SNR and histogram metrics together, and it visible that adversarial and clean samples are now more easily separable. In the Fig. 'b' of 10 blues are the clean samples and re are the adversarial samples. We can see some clean samples are overlapping with adversarial samples but other way is rare. In the figure 'a' we presented the adversarial attack type two and we see some CW samples are also overlapping with clean samples but it is negligible.
In the Fig. 11, we presented each class label adversarial and clean data both without adversarial classification Fig. 9 Experimental data representation space after filter applied with one metrics (here clean is green, red is FGSM, blue is JSMA and yellow is CW) Fig. 10 Experimental data representation space after filter applied with two metrics and with adversarial classifications. This visual presentation shows that, when we each class as inlier and all other as outlier, than adversarial samples were more easily detectable.
In the Table 6, we used six different learning method to differentiate between clean and adversarial attack type using image pixel information, it is evident that random forest performs better than others and in the Table 7, we converted in as binary problem where only clean and adversarial input was classified. SVM method performed very poorly as the representation spaces was not linear. But random forest performs well that other methods. But when we applied SNR and Histogram feature based classification all other method except SVM started to performs well and neural network started to outperforms other methods as presented in Tables 8 and 9.
In the Table 10, we presented the identification of different class labels correctly using SNR and histogram value-based checking. We used Random-forest learning. It is seen that some classes are harder to identify than other class labels. As an example, class 2 and 3 is harder than identify adversarial class for input label 9.

Experiment with CIFAR and IMAGENET
In the Table 11, we compared v-detector performance on MNIST digits (0-9) as illustrated in Fig. 11 and 4 class's of CIFAR-10 dataset. Our result shows that v-detector outperforms other out-lire detector consistently for all attack type and dataset. Fig. 11 Experimental data representation space for each class of MNIST digits with adversarial attack (here clean is blue, red is fgsma, green is JSMA and yellow is CW)   In the Table 12, we presented results using similar experiments we used for ground truth experiments. Our performance of CIFAR and IMAGENET is very good compare to the state-of-the-art attack. Also, a good portion of false positives was failed adversarial examples due to perturbation loss while converting physical form. This result verifies that the same filters and histogram, SNR-based methods are applicable for all datasets of the same domain. We also tried to formulate BPDA attack against our defense but failed to formulate the attack.
When we were evaluating our defense against an advanced attack (with very low noises/perturbs) we observed that as all adversarial attack types aim to reduce the perturbation in advanced attack types, the magnitude of perturbation gets so small that they get vanished in rounded values when converting to visual form. Kurkin and Yan Goodfellow in their paper describe this phenomenon of destruction rate by the below equation [47]. Our results in imagenet dataset also effected by this phenomenon.

Comparison with other methods
There are two primary kinds of the way when making defense against adversarial samples, one is Proactive, and another is Reactive. Reactive is detecting the adversarial example before it enters in ML models. An alternative approach is making the ML model better to identify the right class of the adversarial example from the targeted class [29,84]. Defense techniques against adversarial methods can be summarized in three types: -Denoising strategy or gradient masking: Try to remove the distortions of the image. -Basic adversarial training: Train the neural network with adversarial example -Ensemble methods: Add multiple neural network with transformed dataset to combine a majority result In some adversarial defense techniques, well-known robust recognition models are trained on adversarial inputs proactively, performing defensive distillation and training the network with enhanced training data all to create a protection against adversarial example [34,57,66]. For detecting adversarial input, histogram-based methods are also used [68]. In 2017, [19] tested ten defense techniques; by detailed evaluation, they showed that pre-processing techniques could be easily bypassed. In Table 13, we compared our results with other techniques; it is exhibited that our defense's performance is similar to other defense techniques, but our defense technique has some advantages over those like our model does not modify the ML model, it is impossible to have an adaptive attack on our defense. ML model efficiency does not reduce; instead, results get re-verified thus improve trustworthiness. However, the efficiency of our approach largely depends on the individual accuracy of outlier detection methods and noise detection filter sequences.
Adversarial training diminishes the ML model's accuracy and can make the ML model more exposed to generalization [69]. Another disadvantage of Adversarial training based defense techniques is that we need to retrain the model whenever some new attack samples are discovered. It will be hard to update all deployed ML models. Our strategy does not require any dataset not it changes ML anyway, thus no effect on ML model performance. Most preprocessing techniques reduce the adversarial effect before sending it to the ML model. The major drawback of these techniques is that their processing techniques are static; they do not evolve alongside the attack. Our strategy updates itself, it is not vulnerable to this type of adaptive attack. We also have a detection technique module which can detect adaptive attack query. Distillation techniques work by combining the double model, and the second model uses the first model knowledge to improve accuracy. The black-box attack's recent improvement makes this out-of-date defense [22]. The strong transfer-potential of adversarial samples across neural network models [66] is the main reason for this method's collapse. It is not robust as simplistic variation in a neural network can make the system exposed to attacks [18]. The advantage of our approach over defense distillation is we do not need to modify the neural network. Our proposed approach does not need to know or change any ML model layer. So, our model remains the same for both black box and white box attack methods. [39] concluded that combining/ensemble weak defenses does not automatically improve a system's robustness. Also, the ensemble technique remains static and vulnerable to a new attack. Our proposed solution selects defense technique (filer method and outlier detection method) dynamically, thus it is robust and auto-updating decision boundaries also defend against query-based attacks. Feature squeezing [95] method reduces the data, and it reduces the accuracy of the ML model. There is no such reduction in actual model accuracy in our proposed solution. [72] proposed a mechanism to leverage the power of Generative Adversarial Networks to decrease adversarial perturbations' efficiency. The GAN efficiency depends on the GAN training, which is computationally complicated and needs proper datasets, whereas our system does not need a complicated training method. In summary, any commercial product that is using advanced machine/deep/reinforcement learning can benefit from our innovative DF technique.
-Use of commutative dual filtering technique in any AI/ML-based utility applications. -Use of negative filtering will prevent Trojan AI to change decision resulting in robust AI/ML systems. -Easy to incorporate in existing and future ML systems will increase adoption and deploy ability. -Enhanced performance/accuracy and robustness of ML products and online services will increase in diverse applications. -Improved security will result in quality of experience of users.

Conclusions
We have designed a dual-filtering strategy that does not require any modification to the ML model or information inside the ML model. Our strategy can implement in any ML-based system without costly pre-training. It is to be noted that current adaptive attacks are ineffective in our DF defense. Since our strategy verifies the inputs of the ML model and its output with non-obvious diverse inspection and secondary (outlier) detection. Empirical results exhibited that it could increase the trustworthiness of the ML-based applications. Our experiments were primarily on the computer vision domain, but our DF technique is also suitable for other domains (audio, text, time series). Future work will expand our experiments in different domains and enrich our filter ensemble for better performance. We plan to release this filter collection as a library with our DF framework so that secure learning systems can be developed and deployed. Our technique can be suitably tuned for speed and accuracy; also, as it is independent of the ML, making the DF framework suitable for privacy-preserving applications.

Conflict of interest
The authors declare that they have no conflict of interest.
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecomm ons.org/licenses/by/4.0/.

A.1 Gradient-based attacks
This method computes an adversarial image by adding a pixel-wide perturbation of magnitude in the direction of the gradient. This perturbation is computed with a single step, thus is very efficient in terms of computation time [34]. A simple formulation: here, x is the adversarial example that should look similar to x when is small, and y is the models output. is a small constant that controls the magnitude of the perturbation, and J denotes the loss function of the model.

Optimization-based attack
The Carlini and Wagner method is a bit different from the above gradient-based methods in that it is an optimizationbased attack that constructs adversarial examples by approximately solving the minimization problem [95]. This formulation of the loss function in CW attack can be stated as Here, Z (x ) denotes the logits (the outputs of a neural network before the softmax layer) when passing adversarial input (x ) and t represents the target misclassification label (the label that we want the adversary to be misclassified as), while k is a constant that controls the desired confidence score .

A.2 Score-based attack
Other usual adversarial images are constructed by perturbing all pixels with an overall constraint on the strength of accumulated modification which they tried to make smaller as possible. But in one pixel or few pixel attack attacker tried to change as much as possible to convert the images to an adversarial image [77]. Here differential evolution (DE) is used, which is a population based optimization algorithm for solving complex multi-modal optimization problems.

A.3 Decision-based attack
Decision-based AAs basic algorithm is it initialized from a point that is already adversarial and then performs a random walk between the adversarial and non-adversarial region in a way that it fill up below criteria, -(1) It stays in the adversarial region.

Appendix B: Adversarial defense
Angelova and Abu-Mostafam [6] used pruning training sets for learning of object categories they applied to bootstrap and Naïve Bayes algorithm. Brückner and Scheffer use of game theory in 2011 also shows a diverse approach in developing input filters [15]. Goodfellow et al. [34] tried to training on adversarial inputs pro-actively, Papernot et al. performed defensive distillation [66] and Miyato et al. training the network with enhanced training data all to create a protection against adversarial example [57]. Grosse et al. [35] did statistical tests using a complementary approach to identify specific inputs, that are adversarial. Wong et al. showed convex outer adversarial polytope can be a proven defense [92]. Lu et al. [52] checked whether the depth map is consistent or not (only for image) to detect adversarial examples. Metzen et al. implemented deep neural networks with a small "detector" sub-network were trained on the binary classification task of distinguishing factual data from data containing adversarial perturbations [56]. The same year, Madry et al. [55] published a paper on adversarial robustness of neural networks through the lens of robust optimization. Chen et al. tried to devise adversarial examples with another guardian neural net distillation as a defense from AAs [24]. Wu et al. [93] developed highly confident near neighbor (HCNN), a framework that combines confidence information and nearest neighbor search, to reinforce adversarial robustness of a base model. Also Paudice et al. [67] applied anamoly detection and Zhang et al. detected adversarial examples by identifying significant pixels for prediction which only work for images [98]. Other researchers such as Wang et al. tried with mutation testing [89] and Zhao et al. developed key-based network, a new detection-based defense mechanism to distinguish adversarial examples from normal ones based on error correcting output codes, using the binary code vectors produced by multiple binary classifiers applied to randomly chosen label-sets as signatures to match standard images and reject adversarial examples [99]. Later that year Liu et al. tried to use steganalysis [50] and Katzir et al. implemented a filter by constructing euclidean spaces out of the activation values of each of the deep neural network layers with k-nearest neighbor classifiers (k-NN) [44]. A different notable strategy was taken by researchers Pang et al. They used thresholding approach as the detector to filter out adversarial examples for reliable predictions [63]. For an image classification problem, Tian et al. did image transformation operations such as rotation and shifting to detect adversarial examples [82] and Xu et al. [95] simply reduced the feature space to protect against adversary. Monteiro et al [59] developed inputfiler which is based on bi-model decision mismatch of image. Sumanth Dathathri showed whether prediction behavior is consistent with a set of fingerprints (a data set of NN) named NFP method [31]. Same year, Crecchi et al. used non-linear dimensionality reduction and density estimation techniques [27] and Aigrain et al. tried to use confidence value in CNN [3]. Some other notable works in that year were meta-learning based robust detection method to detect new AAs with limited examples developed by Ma et al. [54]. Another important and effective work was done by Chen et al., where they tried to keep the records of query and used KNN to co-relate that with adversarial examples [25].