Uncertainty estimation based adversarial attack in multi-class classification

Model uncertainty has gained popularity in machine learning due to the overconfident predictions derived from standard neural networks which are not trustworthy. Recently, Monte-Carlo based adversarial attack (MC-AA) has been proposed as a simple uncertainty estimation method which is powerful in capturing data points that lie in the overlapping distribution of the decision boundary. MC-AA produces uncertainties by performing back-and-forth perturbations of a given data point towards the decision boundary using the idea of adversarial attacks. Despite its efficacy against other uncertainty estimation methods, this method has been only examined on binary classification problems. Thus, we present and examine MC-AA with multi-class classification tasks. We point out the limitation of this method with multiple classes which we tackle by converting multiclass problem into ‘one-versus-all’ classification. We compare MC-AA against other recent model uncertainty methods on Cora – a graph structured dataset – and MNIST – an image dataset. Thus, the conducted experiments are performed using a variety of deep learning algorithms to perform the classification. Consequently, we discuss the best results of model uncertainty with Cora data using LEConv model of AUC-score 0.889 and MNIST data using CNN of AUC-score 0.98 against other uncertainty estimation methods.


Introduction
Machine learning applications have been exhaustively attracting the interests of many in the globe with various applications such as healthcare [6,10,17], blockchain [3,41], cyber-security [9,16], and self-driving cars [25,32]. On the other hand, machine learning models accompanied with softmax outputs often produces overconfident predictions which lead to poor decision-making regardless the models' performance. In Fig. 1, an image of digit "2″ in the MNIST dataset is predicted as "7″ with high confidence. Indeed, it is obvious to interpret this confusion by the machine learning model wherein the machine learning confidently misclassifies the given example. There are more serious scenarios that provide erroneous and confident predictions leading to more cataclysmic decisions such the case in self-driving cars [36]. In May 2016, a fatal accident is caused by the autopilot feature of a Tesla Model S, which failed to discriminate the white tractor-trailer against a bright sky. Furthermore, other applications are also subjected to misleading predictions such as in misclassifying legitimate transactions as illegal in anti-money laundering [4]. These faults caused by the learning model should be detected, analysed, and avoided. Consequently, model uncertainty is unavoidably needed besides the model's predictions to provide a reliable decision-making. Model Uncertainty in machine learning has interestingly become an emergent topic by the resurgence of Bayesian approximation in [12]. Regarding classification tasks, there are two main types of uncertainty known as epistemic and aleatoric uncertainty. Epistemic occurs when a new data point lacks to in formation of training instances, whereas aleatoric is observed when data points fall in common regions of different class distribution resulting in wrong predictions. The former can be reduced by more data, while the latter is unavoidable unless more informative features are added to obtain a better classification rule. Gal et al. [12] have tackled the computational complexity of Bayesian neural networks by proposing Monte-Carlo dropout a Bayesian approximation methodwhich efficiently produce uncertainty estimates in machine learning models. A comprehensive review covering the recent advances in uncertainty quantification methods is carried out by Abdar et al. [1]. Concisely, uncertainty quantification methods have appeared to detect data points that the model is not trained on such in deterministic uncertainty quantification (DUQ) [37], or to capture data points that fall near decision boundary such in Deep Ensembles [20] and Monte-Carlo dropout (MC-dropout) [13]. DUQ reflects the out-of-distribution tested points which appear far from the trained data. Deep Ensembles and MC-dropout are commonly based on ensemble of models to produce uncertainty. However, the former method is a combination of model with different initialised parameters. Whereas the latter one is an ensemble of models with shared parameters by the means of the dropout. Despite their efficiency in modelling uncertainty, these methods have revealed a significant drawback [2,5]. It is identified by the failure of these methods in detecting the points that fall in the overlapping regions. Henceforth, the data points falling in the overlapping regions of multiple class distributions cannot be influenced by the variability of the decision boundary referring to [2]. The latter study has introduced Monte-Carlo adversarial attack (MC-AA), an uncertainty method that provides perturbations on the given data point in the direction of the decision boundary. The perturbed inputs are computed using the adversarial attack method where multiple perturbed input samples are linearly generated. Consequently, these samples produce uncertainty in the same experimental procedure to MCdropout. MC-AA has shown its capability in capturing model uncertainty on data points by forcing them to travel between two given class distributions. Despite its promising performance, MC-AA has been only studied with binary classification problems. Motivated by previous work in [2], we propose a general way of MC-AA method that can be applied to multi-class classifications. MC-AA in multiclass classification is a challenging problem because there exist multiple decision boundaries. In other words, it is not clear in which directions the multiple back-and-forth perturbations on a given data point should be performed.
In this study, we propose a generalised MC-AA that performs multiple perturbations in backand-forth fashion towards the decision boundary that is associated with the predicted class on a given data point. Subsequently, we conduct our experiments using a variety of deep learning models on Cora and MNIST datasets as a type of multi-classification problems. Then, we evaluate and compare the performance of the model uncertainty using MC-AA against recent uncertainty estimation methods. On the other hand, the presented MC-AA has revealed limitations in some cases, which we discuss in the experiments. We effectively solve these limitations by converting multi-class classification to one-versus-all binary classification. Admittedly, we show the competence of the presented MC-AA in capturing uncertainty against other uncertainty methods on the given datasets. This paper is structurally divided as follows: Section 2 involves the overview of the related work. Section 3 provides the background of the previously proposed MC-AA on binary classification problems. Section 4 demonstrates our generalised MC-AA with multiclass classification tasks. Section 5 provides the conducted experiments and results. A discussion and a conclusion are given in Section 6 and 7, respectively.

Overview of related works
Primarily, neural networks with distributions over the weights have emerged as Bayesian neural networks (BNNs) that have been studied by Neal et al. [28,29] and by Mackay et al. [24]. Subsequently, BNN models have witnessed resurgence in the recent years referring to [7,15,18]. Admittedly, BNNs have revealed significant success in modelling uncertainty of neural networks. However, this approach is subjected to prohibitive computational cost referring [12]. Consequently, Gal et al. [13] have proposed MC-dropout method as an approximation of Bayesian approach which uses dropout as a variational inference. MCdropout is a simple and efficient method that uses dropout [35] after each hidden layer to produce uncertainty estimates in neural networks [19]. To produce uncertainty, this is performed using dropout during testing phase [13]; thus, a data point is subjected to multiple stochastic versions of perturbed decision boundary which reflects the uncertainty about its predictions. Subsequently, an adaptive version of MC-dropout has appeared in [14] where the dropout parameter is optimised with respect to a given objective function. Deep Ensemble method is another Bayesian approximation which utilises an ensemble of multiple neural network models with different initialisations [20]. The study in [30] has shown that Deep Ensemble method has a superior success over BNNs. However, this method has shown poor performance on a simple 2D synthetic data [37]. The latter study has introduced deterministic uncertainty quantification (DUQ) model which reliably captures the out-of-distribution datai.e., data points distant from the trained data. DUQ is a deep model that learns feature representations in which the distance between these features and centroids derived from the training data are assessed using a kernel function. This model that uses radial basis function (RBF) kernel is known as RBF network [22]. Other uncertainty estimation methods also exist such as using an approximated variational inference by Gaussian processes in [38], DropConnect as another version of MC-dropout [27] and uncertainty estimation based on evidential deep learning [40]. MC-dropout and Deep Ensemble methods seek to perturb the decision boundary between different class distributions where they have revealed promising results in capturing model uncertainties. However, these methods have failed to capture data points that fall in the overlapping region of class distributions. This issue has been tackled in [2] by proposing MC-AA. MC-AA is an uncertainty estimation method that uses adversarial attack idea to perform back-and-forth perturbations of a given data point toward the decision boundary. Primarily, adversarial attacks have been extensively discovered in various aspects of machine learning such as in [23] to improve classification, in [34,39] to act against adversarial examples. However, MC-AA is used to spot out data points lying near decision boundary of neural network models, wherein noisy points can be detected with high uncertainty. MC-AA has revealed significant outperformance over other methods in producing reliable predictive uncertainties in binary classification problems [2]. Thus, we conduct experiments on multiclass classification datasets using various deep learning model to capture model uncertainty using MC-AA. Furthermore, we compare the model performance against MC-dropout, DUQ and Deep Ensemble methods.

Methods for quantifying uncertainty
In this section, we present the methods used in our experiments to quantify uncertainty of deep learning models.

Monte-Carlo based adversarial attack: MC-AA
Adversarial attacks are crafted inputs to fool the neural network decision [8]. Obtaining adversarial example in white box attacks using Fast Gradient Sign Method (FGSM) can be expressed as: where x Adv is the crafted input known as an adversarial example, ϵ is a small scale between 0 and 1, ∇ x is the gradient with respect to the initial input x, and y is the desired class label. Moreover, sign is the sign function that produces 1 for positive values and − 1 for negative ones.
The study in [2] has proposed MC-AA, an uncertainty estimation method based on the idea adversarial attacks, which produces uncertainty estimates besides the predictions of a neural network. MC-AA uses FGSM method to perturb the inputs during the testing phase with multiple values of ϵ that belong to a small symmetric interval in a neighbourhood of zero. However, FGSM requires the desired class label to shift the data input in the opposite direction of the assigned class. In [2], MC-AA assigns an arbitrary class label (i.e., 0 or 1) to FGSM to its given inputs, wherein multiple perturbed versions of each input are produced derived from multiple values of ϵ.
For multiclass classification, MC-AA is modified by assigning the class predictions for FGSM as provided in Algorithm 1. This modification takes into consideration the multiple classes where the given input is perturbed towards/away from the direction of its predicted class regardless of other classes. Referring to [2], the predictive mean can be computed as follows: where b y ϵ i is the output associated with x ϵi at ϵ i , and i = 1, 2, …, T, where T is the total number of the produced outputs to be tuned. To obtain the predictive uncertainty, we use mutual information (MI) that can be computed as follows: where c is the class label, and

Monte-Carlo dropout: MC-dropout
In this method, the dropout is activated during the testing phase which is applied after each weight layer in a neural network. Given a neural network with input x and its observation y that is trained on D train , with L layers and learnable parameters w, then the predictive distribution can be written as: where p(y| x, w) is the likelihood of the model and p(w| D train ) is the posterior distribution over the weights. Referring to [13], the posterior distributionwhich is intractablecan be approximated by q(w) by the minimisation of Kullback-Leibler divergence. By variational inference, the approximated predictive distribution becomes: Gal et al. [13] has chosen the approximated posterior q(w) as the distribution over the matrix of learnable weights with the randomly dropped out connections for posterior approximation. This is performed using the dropout during the testing phase. In other words, q(w) can be defined as: with z i, j as the realisations drawn from Bernoulli distribution for i = 1, …, L and j = 1, By drawing T samples from Bernoulli distribution, this produces W t 1 ; …; W t L È É T t¼1 which so allows to express the approximated predictive mean of a given input as: Hence the predictive uncertainty using mutual information can be expresses as: Where c is the class label and

Deterministic uncertainty quantification (DUQ)
DUQ consists of feature extractor as a base model followed by an additional learnable layer to obtain the feature vectors corresponding to each class. The predictions are performed by computing a kernel function. The kernel function is the RBF kernel which computes the distance between the feature vectors and the centroids. The centroid of each class is updated using an exponential moving average of the feature vectors of the data points corresponding to the class with a momentum γ. The predictive uncertainty is obtained in a single deterministic forward pass. The output of a DUQ model can be expressed, referring to [37], as: Where f θ is the feature extractor mapping from input x of dimension m to the feature vectors of dimension d, and learnable parameters θ. W cfor a class cis a weight matrix of size n by d corresponding to the additional layer that transforms the output of the feature extractor to new embedding space with centroids size. K cthe kernel outputis computed for each centroid class e c with σ being the hyperparameter called the length scale. The prediction of this model is represented as: Hence, the predictive uncertainty can be obtained by finding: max The optimisation function can be expressed as: Moreover, there is further regularisation using two-sided gradient penalty (l 2 norm) where this penalty consists of regularisation factor λ to be tuned.

Deep ensemble
Deep Ensemble is a collection of deep models with different initialisations. Training multiple models with distinct initialised weights produce multiple outputs on a given prediction like MC-dropout but with independent parameters. Hence, the predictive mean can be obtained as follows: where M i (x) is the prediction obtained by model M i on a given input x. The predictive uncertainty is obtained using Eqs. 9 and 10 but replacing p MC by p ensemble .

Evaluating model uncertainty
To evaluate the goodness of model uncertainty, we follow the same procedure applied in [26]. The predictive mean beside the predictive uncertainty can reflect the model uncertainty.
Predictive mean provides the correct or incorrect classification with respect to the actual labels. Predictive uncertainty is derived from MI measurement, in which an arbitrary threshold T u is set to classify MI between certain and uncertain. By tying predictive mean with uncertainty, we can realise four states that resemble the binary classification task as provided in Table 1. For simplicity, MI measurements are normalised via min-max with respect to the test set. Consequently, T u is an arbitrary threshold between 0 and 1. The following abbreviations TN, FN, FP, TP correspond to true negatives, false negatives, false positives, and true positives, respectively. Referring to Table 1, higher TN and TP are desired, with lower FP and FN. However, FN hurts the goodness of uncertainty in which erroneous predictions with high certainty are produced. On the other hand, FP is preferably required to be low, but this does not affect the performance of model uncertainty because uncertain and correct examples can be forwarded to an annotator. These measurements can be written as conditional probabilities to assess the performance of model uncertainty as following: & Accuracy of model uncertainty: Furthermore, the last two metrics can be used to plot Receiver-Operation-Curve (ROC) and compute Area-Under-Curve (AUC) score to evaluate the goodness of model uncertainty by moving the threshold T u between 0 and 1.

Experiments and results
In our experiments, we apply different machine learning models on graph and image datasets known as Cora and MNIST, respectively. Then, we estimate uncertainties besides the predictions to evaluate and compare the different uncertainty estimation methods. We use Pytorch [31] and Pytorch-Geometric package [11] in Python programming language.

Experimenting with graph data
As an example of graph data, we use Cora dataset to assess the proposed uncertainty method. Cora is a graph-structured data that comprises academic publications as nodes, and citations as the edges [33]. This data is used in node classification tasks in which each node is classified into one of seven subjects. The node features reflect the absence/presence as 0/1 of the corresponding word in the dictionary, in which the unique words in the dictionary are the total number of features. The data is described in Table 2. In this paper, we follow the same experimental setup for this data as in [42].
Since Cora is graph-structured data, we choose various graph neural network models to perform node classification. The graph learning models are arbitrarily chosen as following: & GCN: Graph Convolutional Network based spectral approach. GCN layer can be expressed as: & GraphConv: Graph Convolutional Network based spatial approach. GraphConv layer can be expressed as: & GAT: Graph Attention Network. GAT layer can be expressed as: It is expressed as: & TAGConv: Topology Adaptive GCN. It is written as: i is the embedding derived from the input node i in the hidden layer, Θ k is the learnable weight matrix at layer k, e i, j is the edge weight which is arbitrarily equal to 1, mean is the average over the sum, b d i is the degree of node i and N i ð Þ is the set of nodes in neighbourhood of node i.
The widths of all hidden layers are set to 16 neurons, a dropout after each hidden layer is set to 0.5 and the number of epochs is set to 100. We use a non-weighted NLLLoss and Adam optimiser to train the given models. Each of the preceded models consists of two graph convolutional layers. All hidden layers are squashed by ReLU and the output layers are followed by softmax function except for DUQ that uses RBF kernel as output. To capture model uncertainty, we use MC-AA, MC-dropout, DUQ and Deep Ensemble methods. The hyper-parameters for these methods are empirically tuned which are summarised in Table 3.
After computing MI, we plot A u , NPV, TPR and ROC to compare our proposed method with MC-dropout as depicted in Figs. 2, 3, 4, 5, 6 and 7.

Experimenting with image data
Regarding MNIST dataset [21], the data is divided as 90 k/10 k for train/test split. We use convolutional neural network (CNN) model that appeared in [37] as a deep feature extractor. The deep model consists of 3 CNN layers with output channels 64, 128 and 128, respectively, and a feed-forward layer with widths of 256. The kernel size is set to 3. Moreover, batch normalisation after every convolutional layer followed by 2 max pooling of size 2 by 2 is applied. The padding in the first two convolutional layer is set to 1. A dropout of value 0.5 is empirically set after the feed-forward hidden layer. The learning rate is set to 0.001, chosen empirically. The output layer is followed by softmax function to output the class prediction which is one of the handwritten digits from 0 to 9. The batch size is arbitrarily chosen to be 1024, then we perform 30 epochs to train the model. This model has attained an accuracy over 98% to classify the digits. Likewise, we capture model uncertainty on CNN model by performing MC-AA, MC-dropout, DUQ, and Deep Ensemble where the hyper-parameters are summarised in Table 4. As the input images 28 × 28 are grey-scale, the perturbed inputs by MC-AA should be clamped between −1 and 1 which is the range of the pixel values. We plot the preceded model uncertainty metrics as depicted in Fig. 8. The performance of model uncertainty is computed by moving a threshold between 0 and 1 and computing the evaluation metrics provided earlier at each threshold. Generally, MC-AA for multiclass classification, has noticeably revealed competence against other uncertainty estimation methods on various graph learning models with Cora data in terms of accuracy and ROC-AUC curve plots in Figs. 2, 3, 4, 5, 6 and 7. Regarding other measurements, NPV metric has shown the highest with MC-AA against other methods with GCN and TAGCN models. TPR metric has revealed acceptable outcomes with MC-AA in the different graph learning models except for LEConv. All graph learning models have admitted the outperformance of DUQ in the subplots corresponding to TPR metrics. The same models have revealed the poor accuracy using DUQ model wherein the overall model is considered with a deficient performance. Despite MC-AA has revealed competent uncertainty estimates, this method has performed poorly with LEConv model. The reason of the deficient performance is due to multiple class distributions that restrict the behaviour of MC-AA. In other words, FGSM method in multiclass models might produce a perturbed input that cannot escape its relevant class. Thus, the perturbation using adversarial attack idea is not exact towards the decision boundary as we use the sign of gradients which is the L ∞ norm and neglect the ratios corresponding to each dimensional feature. Therefore, the perturbation on the given input does not allow this data point to fall in another class. For this reason, we propose a way to avoid this drawback of MC-AA by converting multi-class classification problems into one-versus-all binary classification which has appeared to be more effective with MC-AA. We choose LEConv which performed poorly with MC-AA. To convert to "one-versus-all" classification, we choose the class with the highest false instances among all other classes to be the positive class, which is class 4, while the remaining labels are assigned as the negative class. However, the same concept can also be applied to the different permutations of one-versus-all binary classifications (e.g., first class versus others, second class vs others, etc.). The model LEConv is trained following the same experimental setup as preceded. Henceforth, we capture model uncertainty using MC-AA (for binary classification), MC-dropout, DUQ, and Deep Ensemble. The results are provided in Fig. 9.
After converting the classes of Cora data into binary labels, MC-AA has shown a superior success against other uncertainty methods. Clearly, low FN (incorrect and certain) is provided. Here, the sign of the gradients in FGSM method allows the data points to jump to the opposite class as there are only two competing classes.

Uncertainty performance with MNIST data
Referring to Fig. 8, the model uncertainty of CNN using MC-AA has outperformed other uncertainty methods using MNIST data. The overall model performance among all methods have attained the same accuracy. Whereas NPV and TPR metrics have depicted superior success with MC-AA wherein lower FN (incorrect and certain) has been obtained. To highlight the effectiveness of uncertainty estimates, we plot the normalised density distribution of the predictive uncertainties in Fig. 10. We cluster the distributions according to the correct/incorrect predictions, referring to Table 1. Clearly, all methods Fig. 6 LEConv model uncertainty. The subplots (from left to right) correspond to A u , NPV, TPR and ROC-curve as a function of threshold T u Fig. 7 TAGConv model uncertainty. The subplots (from left to right) correspond to A u , NPV, TPR and ROCcurve as a function of threshold T u commonly have reflected good uncertainty estimates for correct predictions. However, the distribution of incorrect predictions among all uncertainty methods has shown different densities. The density of incorrect predictions with MC-AA is more concentrated towards the right values of mutual information where these predictions are considered uncertain. With other methods, the predictive uncertainty of incorrect predictions is distributed among the whole mutual information scale. Moreover, the uncertainty estimates with DUQ model have depicted more mix between correct/incorrect densities which is reflected in Fig. 9. In addition, we compute the mean/standard deviation to describe these distributions as provided in Table 5. The mean of incorrect predictions with MC-AA has attained the highest value with the smallest standard deviation. This is desired to obtain an effective model uncertainty that classifies incorrect predictions as uncertain. On the other hand, all methods have shown adequate results of low mean/low standard deviation for correct predictions. To pinpoint the effectiveness of MC-AA on the image MNIST data, we provide a case-study where we opt for an image example, depicted in Fig. 1, that is wrongly predicted by CNN among all methods. We investigate the predicted class of this digit two image as well as its uncertainty estimates by MC-AA, MC-dropout, DUQ and Deep Ensemble as provided in Fig. 11. Closely, all methods have erroneously predicted this digit as seven with high confidence, whereas the class two has provided a low predicted probability.

Conclusion
We have extended the study of MC-AA, an uncertainty estimation method based adversarial attack that works for multiclass classification. By benchmarking MC-AA method against other uncertainty estimation methods, we have shown the effectiveness of MC-AA in capturing model uncertainty using deep models on graph data of Cora and image data of MNIST. Concisely, we have examined MC-AA with multiclassification shown a significant outperformance using MNIST without the need to convert to binary classification. The recorded AUC-score of this method is 0.98 outperforming other methods in addition to their NPV and TPR curves. To wrap up, MC-AA is powerful in reducing the number of false negatives of model uncertainty (i.e., data points that are incorrect but certain). This is due to the perturbations that are performed on the input level, unlike previous uncertainty methods. The limitation of this study is that it is not Fig. 9 Model uncertainty of LEConv as binary classification (class 4 vs rest) using MC-AA and MC-dropout. The subplots (from left to right) correspond to A u , NPV, TPR and ROC-curve as a function of threshold T u Fig. 10 Distribution of predictive uncertainty measurements derived from MNIST test set using MC-AA, MCdropout, DUQ and Deep Ensemble. Correct predictions curve is the uncertainty measurements of the data points that are predicted correctly. Similarly, incorrect predictions curve is the uncertainty measurements of the data points that are predicted incorrectly known when MC-AA is a good approach in multiclassification tasks. However, the conversion to binary classification is always promising since MC-AA tends to perturb an input between decision boundaries. Consequently, having a single decision boundary leads to effective results. Whereas multiple classes mislead the perturbed input which fails to reflect a good uncertainty estimate using FGSM sign method in MC-AA. We foresee in future work to replace FGSM that uses l ∞ to more effective norm in MC-AA, where the gradients directions are more accurate towards the class boundary. This could lead to accurate perturbation towards the class boundary and eventually produce better uncertainty estimates.

Declarations
Conflict of interest The authors declare that they have no conflict of interest.
Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ .