Untargeted white-box adversarial attack to break into deep learning based COVID-19 monitoring face mask detection system

Sheikh, Burhan Ul haque; Zafar, Aasim

doi:10.1007/s11042-023-15405-x

Untargeted white-box adversarial attack to break into deep learning based COVID-19 monitoring face mask detection system

Published: 05 May 2023

Volume 83, pages 23873–23899, (2024)
Cite this article

Download PDF

Multimedia Tools and Applications Aims and scope Submit manuscript

Untargeted white-box adversarial attack to break into deep learning based COVID-19 monitoring face mask detection system

Download PDF

1706 Accesses
9 Citations
1 Altmetric
Explore all metrics

This article has been updated

Abstract

The face mask detection system has been a valuable tool to combat COVID-19 by preventing its rapid transmission. This article demonstrated that the present deep learning-based face mask detection systems are vulnerable to adversarial attacks. We proposed a framework for a robust face mask detection system that is resistant to adversarial attacks. We first developed a face mask detection system by fine-tuning the MobileNetv2 model and training it on the custom-built dataset. The model performed exceptionally well, achieving 95.83% of accuracy on test data. Then, the model’s performance is assessed using adversarial images calculated by the fast gradient sign method (FGSM). The FGSM attack reduced the model’s classification accuracy from 95.83% to 14.53%, indicating that the adversarial attack on the proposed model severely damaged its performance. Finally, we illustrated that the proposed robust framework enhanced the model's resistance to adversarial attacks. Although there was a notable drop in the accuracy of the robust model on unseen clean data from 95.83% to 92.79%, the model performed exceptionally well, improving the accuracy from 14.53% to 92% on adversarial data. We expect our research to heighten awareness of adversarial attacks on COVID-19 monitoring systems and inspire others to protect healthcare systems from similar attacks.

Beyond accuracy and precision: a robust deep learning framework to enhance the resilience of face mask detection models against adversarial attacks

Article 18 July 2023

Face mask detection in COVID-19: a strategic review

Article 05 May 2022

Performance Analysis of ASUS Tinker and MobileNetV2 in Face Mask Detection on Different Datasets

1 Introduction

The Covid-19 epidemic has caused devastation to healthcare facilities and treatment systems in every country. The virus is disseminated by direct contact with contaminated respiratory droplets (produced via sneezing and coughing). Anyone who comes into contact with virus-infected surfaces and touches their faces might get sick, with symptoms such as shortness of breath, cough, and fever. In crowded and public locations, the WHO recommends wearing a face mask to minimize viral transmission via the nose or oral passages [12, 14, 62]. The government of COVID-19-affected nations has passed laws mandating the wearing of face masks. As a result, large groups of individuals in public spaces and congested locations must be monitored for face mask violations. Conventional procedures for checking a face mask violation are not always feasible and are error-prone.

Computing-assisted deep neural network (DNN) techniques have made significant progress in the recent decade, showing promise in a variety of classification tasks [33, 55] and pattern recognition [26]. In the fight against the COVID-19 epidemic, artificial intelligence, and more specifically machine or deep learning, has proved to be a successful and crucial tool. It has proven to be beneficial in early predicting infection by the analysis of the previous data of the patients [5]. In addition, deep learning models have been proposed to detect COVID-19 from the readily available chest X-Ray or CT images [27]. As discussed earlier, wearing a mask is the most effective way to stop the rapid spread of COVID-19. However, ensuring that masks are worn properly in public or crowded places is difficult with conventional approaches, which rely on human monitoring. Fortunately, artificial intelligence has provided a solution with computer-aided face mask detection systems. In computer-aided face mask detection systems, methodologies primarily based on deep learning have been applied to produce rapid, accurate, and trustworthy systems. A large number of researchers have developed a variety of DNN architectures to detect face mask violations. Most of these researchers have based their methods on transfer learning [61], while some have developed novel architectures. DL models have shown high sensitivity and specificity when detecting face mask violations.

Deep learning models need a significant amount of data to train a model effectively in the training phase. When given a new unseen input sample, the deep learning model infers the corresponding output during the inferencing phase. For example, in the case of the face mask detection model, when the model is trained on the dataset, the model is then given unseen data and returns the classification result, i.e., with mask, no mask, or incorrect mask. Deep learning models have various confidential and substantial assets in deep learning models, including training data, trained models and model parameters. However, recent research has proven that these assets are vulnerable to various attacks during the training and inferencing phase of the model, allowing attackers to undermine the model’s security and privacy. These attacks on deep learning algorithms are broadly categorized into four types: Adversarial attack, Data poisoning attack, Model Inversion attack and model extraction attack. Data poisoning is performed before the model’s training process [6]. The training data is contaminated using various techniques, including adding mislabeled data or triggering patches to the input data. It aims to force the model to misclassify the input. Model inversion attack concerns the confidentiality of the data used for the model’s training [31]. In a model extraction attack [28], the attacker tries to get the internal confidential details of the model, such as parameters,hyper-parameters, and the model’s architecture. During an adversarial attack, input data samples are added with the modest, skillfully crafted perturbation that causes the model to misclassify during inferencing [53]. In computer vision, the adversarial attack is primarily applied to the classification models during the inferencing phase. In Fig. 1, we have demonstrated various attacks on deep learning models in their respective phases.

The presence of the adversarial attack calls into doubt the widespread use of deep learning models for face mask recognition. Especially in insecure situations, the prevalence of this adversarial approach raises concerns regarding the widespread deployment of deep learning models for face mask identification. Since various face mask detection systems based on deep learning have been proposed so far, none of them have discussed the possible vulnerabilities in the system and how to mitigate these weaknesses. This study reveals the vulnerabilities of face mask detection systems based on conventional deep learning and proposes a comparatively more robust system against adversarial attacks. As a part of this research, first, we developed a custom dataset for training the proposed face mask detection model. The model is based on transfer learning of the state-of-the-art classification model MobileNetV2 [54] and can recognize and classify the images into three classes, including an incorrect face mask, with mask and no mask, with excellent accuracy of 96%. After that, we assessed the commensurate vulnerabilities of the developed model against the adversarial examples. In particular, we attacked the model by the Un-targeted Fast Gradient Sign Method (FGSM) [17]. The adversarial samples were computed on different, very small epsilon parameter settings of FGSM and the model’s performance was evaluated. The resultant adversarial image looks imperceptible and similar to the original image. These adversarial input images forced the face mask model to misclassify with high and alarming confidence. The result shows that the attack significantly degrades the performance of a model. In order to enhance the model's resilience against adversarial attacks, we implemented an adversarial training strategy with slight alterations [15]. Specifically, instead of retraining the model on the entire dataset, we trained it on a randomly chosen subset of the training data, along with their respective adversarial images. The outcome of this modified training approach demonstrated an improvement in the model's performance on adversarial examples.

1.1 Basic concepts

we have discussed some essential terminologies related to attacks on deep learning-based face mask detection models. The attacks on the system can be classified according to whether the attacker has access to the model; in that case, we have two types of attacks: white box and black-box attacks. Additionally, the attacks can be classified according to whether the model is compelled to produce a particular label or not. As a result, we have two types of assaults: targeted and untargeted.

White box Attack Approach: In this case, the attacker has full access to the system, say ‘F’ or the entire information of a model, including the gradient of the model, model parameters, hyperparameters, and training dataset. The adversary is provided with the classifier ‘F’ in the white-box threat model. Gradient descent on the adversarial loss ℓ_y(·), or an approximation thereof is a powerful attack approach in this scenario. To ensure that the changes remain undetectable, one can regulate the perturbation norm ‖δ‖₂ either by stopping the loss optimization early or by incorporating the norm directly as a regularize or constraint into the loss optimization.
Black box Attack Approach: The attacker only partially knows a model. Gradient parameters and training sets might not be accessible or known to the attacker. Arguably, many white-box assumptions are impractical in many real-world situations. For example, the model ‘h’ may be accessible to the public via an Application Programming Interface(API) that allows only input queries, obviating the attacker of access to the model’s details.
Targeted Attack Approach: The face mask detection system is compelled to output a certain/target label. For example, given an input image of a person wearing a mask, the model may be compelled to output an unmasked face.
Untargeted Attack Approach: It is entirely focused on the misclassification of a model, regardless of the output.

1.2 Contribution

The custom face mask dataset was developed and made publicly available on a KAGGLE repository (https://www.kaggle.com/datasets/shiekhburhan/face-mask-dataset).
We developed the deep learning-based face mask detection model using the conventional approach. The model is based on transfer learning of MobileNetv2 and performed well on the unseen clean data samples with an accuracy of 95.85%
We demonstrated that the conventional model is susceptible to adversarial attacks by exposing it to the FGSM attack.
We proposed the framework for a robust face mask detection model that is resistant to adversarial attacks such as FGSM.

1.3 Paper organization

Following the introduction, Section 2 reviews the literature on face mask detection and adversarial attacks. The proposed dataset and comparison with other standard datasets are described in Section 3. The methodology, which is broken down into three parts, training a face mask classifier, attacking it with FGSM, and improving the robustness of the face mask model, is described in Section 4. Section 5 discusses the Experimental results and outputs in detail. Section 6 deals with the conclusion. Finally, the paper is concluded with the future scope of the study.

2 Related work

The literature review has been split into two sections. First, we will go through the existing literature on face mask detection systems, and then we will discuss the literature on adversarial attacks on COVID-19 monitoring systems, including face mask detection systems.

2.1 Face mask detection models

Due to the ongoing epidemic, there has been a lot of interest in projects with similar purposes. Most researchers utilize convolutional neural networks (CNN) from all the approaches mentioned in the literature because of their outstanding performance and capacity to extract valuable characteristics from the image data. Other methods have employed hybrid strategies that use ML methodologies with or without deep learning.

2.1.1 CNN-based approaches

Contrary to ML approaches, we do not need to manually extract the features in CNN-based methods. CNN uses convolution and pooling techniques to extract valuable features from the input. We have discussed some popular CNN-based face mask detection models in the following.

In [16], the MAFA or Masked Faces face mask dataset was initially produced. They built a CNN model capable of detecting facial occlusion, including masks. They divided their concept into three key components: the proposal module, the embedding module, and the verification module. The initial module combines two CNNs and retrieves facial image characteristics. The second module focuses on detecting facial landmarks that are not obscured by occlusion. The LLE algorithm is implemented at this stage [50]. In the final module, classification and regression tasks are carried out using a CNN to determine if an item is a face and to scale the position of missing facial signals. Identifying side-facing faces degraded the model’s performance, and the dataset contains more occluded than masked faces. Therefore, training with this dataset is not always viable for face mask identification alone. The performance was determined by calculating the precision of each parameter and averaging the precision of various parameters. Recorded precision averaged 74.6%.

In [8], similar to our model, they used MobileNetV2 to classify the face mask and Caffe-based face detector. A small dataset of 4095 images was used. Additionally, the data set has only two classes, masked and unmasked. Hence, the model trained cannot detect the incorrectly masked faces (i.e., having his or her mask below the nose). It achieved a descent f1-score of 0.93.

In [3], a dataset known as “MASKED FACE DATASET” was proposed and three CNN architectures were cascaded for face mask detection. The dataset only consists of 200 images. In order to overcome the problem of overfitting, they used the concept of transfer learning and fine-tuned the model with the WiderFace dataset [63]. The first CNN consists of five layers and is used to scale the input image. The second and third layers consist of seven layers each. The advantage of using three cascaded CNN is that each false detection is eliminated, thus making a prediction stronger. However, using three CNN makes it computationally expensive. The model achieved an accuracy of 86.6% and recall of an 87.8% on their proposed dataset.

In [49], the authors presented the SRCNet model for face mask detection. The model comprises two networks: a classification network and an image super-resolution (SR) network. The model is capable of adequately classifying Incorrect Facemask-Wearing (IFW), Correct Facemask-Wearing (CFW), and No FaceMask-Wearing (NFW). The model was trained using the MMD or Medical Masked dataset [1] and the MobileNetV2 CNN algorithm was adopted. The design was well-organized and effective. However, the dataset used for training was very small and the face mask detection speed was plodding than other algorithms. The model achieved an accuracy of 98.07%.

The approach in [24] identifies three classification categories: no-mask, improper-face mask, and with-mask. The model was trained on a dataset consisting of 35 masked and unmasked face images. Before training, the dataset was first preprocessed and scaled to the necessary dimensions. The model first identifies the face, then extracts the face from the input, and then applies the face masknet model for classification. It includes extremely limited and regionally specific data. The accuracy of the model was reported to be 98.06%.

In [38], they proposed the novel face mask detection technique using YOLOv2 [52] and ResNet50 [18] together. They used the FMD [35] and MMD [1] datasets to train and test a model. SGDM and ADAM optimizers were used to compare the performance [32]. The model achieved an average precision of 81%.

In [23], the VGG16 architecture was utilized to identify and categorize face expressions [55]. The accuracy of their VGG16 model trained on the KDEF database is 88%.

The transfer learning of the InceptionV3 model [56] was used in [29]. The last layer of the model was removed and five new trainable layers were added. The last layer consists of 2 neurons followed by a softmax activation function where each neuron corresponds to a masked face and unmasked face, respectively. The model obtained an accuracy of 99.91% training and 100% testing accuracy in 80 epochs.

In [42], VGG-16 CNN was used for face mask detection. The dataset they developed consists of 25,000 images, and the model was trained on it. The mask-covered area in an image was first segmented and extracted. The proposed model used the Adam optimizer as an optimization function. Their algorithm was 96% accurate at spotting face masks.

The SSDMNV2 model is proposed in [44]. They used a similar approach to ours—for face detection, they utilized a single shot multi-box detector [37] and MobileNetV2 is used for classification. The classification accuracy was around 92% and the F1 score was 0.93. Our proposed system outperforms it with 98.6% training and 97% testing accuracy and a 0.95 F1 score.

In [36], for face detection, YOLOv3 was used. It was trained on celebi and wider-face [63] databases. The model was later evaluated on the FDDB database [25] and achieved an accuracy of 93.9%.

2.1.2 Hybrid-based approaches

The algorithms for deep learning and machine learning were combined in [39]. The deep learning model ResNet50 was employed for feature extraction, while machine learning methods such as Support Vector Machines and Decision Tree algorithms were used for classification. One of the four types of datasets contains both actual and fake face masks. On the training dataset containing actual face masks, the decision trees classifier did not obtain a decent classification accuracy (68%) on false face masks.

In [45], they proposed a model that triggers an alarm for surgical face mask violation in the operating room for face detection. They used Viola-jones face detection and LogitBoost for face mask detection [59]. One of the problems with the model was that it would make a mistake if clothing was found near the face. Synthetic rotation was used to find a solution to this problem. In addition, the model was trained only on surgical facemasks. The recall was said to be above 95%, and the rate of false positives was less than 5%.

In [4], a haar-cascade-based feature detector was utilized to recognize a nose and mouth from the detected face [58]. The model identifies the nose and mouth and predicts an unmasked face. If it detects only the nose, it predicts an incorrectly masked face and a correctly masked face if neither is detected. This method is quick and straightforward, but it can only interpret full-frontal faces and can be tricked by covering the mouth and nose. Our proposed model is able to predict correctly from different orientations of a face and occlusion, such as a hand on a face, hair on a face etc.

Principal Component Analysis (PCA) algorithm was implemented in [11] for face mask violations. It performed well with an accuracy of 96.25 for detecting the faces without the mask, but while detecting the faces with a mask, the performance was reduced to 68.75%.

2.2 Adversarial attack on COVID-19 monitoring models

Since it was claimed in [57] that adversarial techniques might attack neural networks, the study of such techniques has become a hot topic in the field of artificial intelligence, with researchers continually proposing novel adversarial attack methods and mitigation techniques. A paucity of research has been conducted up until this point to address the challenges posed by adversarial attacks on computer-based COVID-19 combating technologies.

In [19], they conducted both targeted and untargeted universal adversarial perturbation (UAP) attacks against the COVID-Net architecture [60]. COVID-Net is a model developed specifically for classifying COVID-19 patients based on chest X-ray pictures. The outcome revealed that the COVID-net is susceptible to tiny UAPs. In particular, 2% of the UPAs to the average norm of image perturbations for face recognition systems using the image translation techniques achieve success rates of >85% and > 90% for nontargeted and targeted attacks, respectively. Due to nontargeted UAPs, most chest X-ray pictures are classified as COVID-19 instances by the DNN models. The targeted UAPs cause DNN models to classify most chest X-ray pictures into a specific target class.

In [51], they studied six different applications used for COVID-19 diagnosis and each adversarial application attack was proposed. They attacked the following applications: 1) recognize whether a subject is wearing a mask from a live camera feed; 2) maintain DL-based QR codes as immunization certificates; 3) add explainability of GRAD-CAM DL algorithms; 4) recognize COVID-19 from CT scan images; 5) detect noninvasive biometrics and identify social distancing from a live camera feed; and 6) recognize COVID-19 from X-ray image analysis. They tested the existing adversarial methods in this study, including FGSM, MI-FGSM, Deepfool, L-BFGS, C&W, BIM, Foolbox, PGD, and JSMA [13].

Using image translation methods, the authors of [30] calculated adversarial perturbation [2] for face recognition systems. To fool the targeted face recognition system, they would take a source image and distort it to create any desired facial appearance. White-box testing showed a 90% success rate for the attack. With the help of adversarial instances and the dynamic distillation technique, the model was able to reach an 80% success rate in the black-box setting. The translated images were realistic to the human eye and kept the person’s identity intact, while the perturbations were significant enough to fool trained defenses.

In [46], they proposed a COVID-19 diagnosis model based on transfer learning. The models were based on the state-of-the-art classification models VGG and the inception model. Models were developed using VGG and the inception model, two of the most advanced classification methods currently available. When the perturbation was increased from 0.0001 to 0.09, the VGG16 model’s accuracy plummeted by more than 90% for X-ray images, while the Inception-v3 network’s accuracy declined by 30%. The FGSM attack similarly exposed vulnerabilities, such as misdiagnosis, in CT imaging. They also demonstrate that the degree of disturbance has a sizable impact on the degree to which people are aware of attacks.

In [41], machine learning techniques including support vector machine, random forest, logistic regression naïve bayes were implemented to classify chest X-ray images as viral pneumonia, COVID-19 and healthy persons. The model used 1400 images that were collected from the Kaggle public repository. The experimental outcomes of this study confirmed that the supported vector machine technique has high accuracy and excellent performance in the classification of the disease, as reflected by values of 91.8% accuracy, 91.7% sensitivity, 95.9% specificity, 91.8% F1-score, and 97.6% AUC.

3 Dataset description

We uploaded a new dataset called the Sophisticated Face Mask Dataset on the Kaggle public repository on 16 June 2022. This dataset contains many different types of images of people wearing face masks. We collected these images from many different places so that the dataset would be unbiased and diverse. We used some images from other popular datasets like Masked FAces (MAFA) 16 and Masked Face Detection Dataset (MFDD), as well as some simulated images to make sure our dataset was complete. We also included some pictures that we found on the internet. Our goal was to create a dataset that could help train a model to recognize different types of face masks from different angles. The model we train with this dataset can categorize images into three groups: a mask on the face, an incorrectly worn mask, or no mask at all. The organization of the data in the dataset can be seen in Fig. 2.

The dataset contains 14,535 images categorized into three primary categories: with mask(4,789), incorrectly masked(5,000), and without mask(4746). Each category has subcategories based on the type of image, which makes the dataset useful for other computer vision tasks such as face recognition and occlusion face detection [22]. The images include various depictions of people wearing masks, including those with masks incorrectly placed on the chin or covering only the mouth, simple and complex masks with different designs, and images of people not wearing masks with occlusions like beards, long hair, or hands covering their faces. some of the data sample in the dataset are shown in Fig. 3. The dataset's data distribution can be seen in the Fig. 4

Some of the data samples in the dataset are shown in Fig. 3.

The dataset has a total of 14,535 images. The incorrect_masked class consists of 5000 images, of which 2500 are Mask_on_Chin and 2500 are Mask_on _Mouth_Chin. The With_mask class has 4789 images, of which 4000 are simple with_mask and 789 are complex with_mask images. Similarly, without_mask has 4746 images, of which 4000 are simple and 746 are complex images. The data distribution of each class can be seen in Fig. 4.

In addition, the proposed dataset has been compared to the standard datasets that have been regularly used for face mask identification algorithms in Table 1.

Table 1 Comparison of various standard facemask dataset

Full size table

4 Methodology

This study develops a face mask detection model based on transfer learning of MobileNetV2, attacks the face mask model by untargeted FGSM adversarial attack and proposes a framework for a robust face mask detection model that is resistant to adversarial attacks. The proposed face mask model is initially trained on the custom compiled dataset for effective and efficient training. For the proposed model, adversarial images are created by computing the perturbation for each test image using the FGSM technique and combining it with the corresponding clean image. The non-robust face mask model is then modified by using the proposed framework.

4.1 Transfer learning of MobileNetV2

MoblineNetV2 is a State-of-the-art CNN mode that performs well on devices such as mobile phones. The layers have been trained on the ImageNet dataset [9] and attained an optimal set of values. It is fast, accurate, and lightweight (resources efficient) when dealing with resource-constrained and real-time systems. The face mask detection system is incorporated in surveillance systems with limited resources, such as CCTV cameras. Therefore, it is the perfect model for the proposed method. MobileNetV2 employs a dual-block structure. One is a residual block with a stride of 1 and the other is of stride 2 that is used for downsampling. Both blocks have three layers each. The first layer is 1 × 1 convolution with linearity. The second layer is the depthwise convolution. The third layer is another 1 × 1 convolution but without any non-linearity.

For the face mask detection model, we removed the last layer of the MobileNetV2 and added four trainable layers, including Dense 128, Dense 62, Dense 32, and Dense 3. We also used the dropout layers between these layers to avoid overfitting. MobileNetV2 layers are kept frozen so that they are not trainable during the training of the face mask model. The last layer of the model consists of 3 neurons, each corresponding to the required class/output. Transfer learning of MobileNetV2 allowed us to save significant computational costs while improving the result. Figure 5 illustrates the architecture of the face mask detection model.

During the model training, we used 93% of the complete data and 7% for testing a model. The model classifies an image with an accuracy of 98.51% on training data and 95.83% on testing data. During the training, the parameters and hyperparameter used are discussed in Table 2. The learning rate controls how fast the network weights are updated. The batch size controls the number of samples before the network updates its parameters. The number of epochs controls the number of training iterations. The dropout rate determines the percentage of parameters that are dropped while training to prevent over-fitting.

Table 2 Face mask detection model parameters and hyperparameters

Full size table

4.2 Untargeted FGSM (fast gradient sign method) attack on the proposed model

We first provide notations and then introduce the formulation of an adversarial attack on the proposed face mask detection system.

Notations: Consider F as a face detection model, x as original input, l as the desired output of a model or class of input, r as a perturbation, t as the target class, and θ as parameters of the face mask detection model.

x ∈ Sample from the dataset.

l ∈ With-mask or without-mask or incorrect-mask.

Formulation: Consider x a sample from the sophisticated face mask dataset, for instance, an image with no-mask on a face. We compute perturbation r of . Adding this perturbation r to the original image x is known as an adversarial example i.e.

x ‘= x + r is an adversarial example.

Note: x^’ should be imperceptible to humans, i.e., it should appear to humans as class l, i.e., no-mask. Given this x^’ as input to the F, the outcome of F is other than no-mask.

The adversarial attack against the proposed model is carried out by computing the adversarial perturbation for each test image and adding it to the corresponding clean image. We employed the FGSM strategy for adversarial perturbation computation due to its simplicity and attacking efficiency. The framework of the adversarial attack on the face mask model is shown in Fig. 6. The TensorFlow methods and classes used to generate adversarial attacks are discussed in Table 3. FGSM is a white-box attack approach in which gradients are computed with respect to the pixels, and then sign operation is applied to the gradient matrix to get the desired result [54]. Back-propagation is used to calculate the gradient, which determines the perturbation direction. Perturbation of image is accomplished by taking a single large step in increasing the classifier’s loss. Simply stated, for each pixel in an input image, what would happen to the target output neuron if the value of that pixel increased or decreased. As the last step, a small value ‘ε’ epsilon is multiplied by the threshold gradient matrix and then add it to the input image. Perturbation is given by the Equation (1):

Table 3 Tensorflow methods used for the generation of adversarial examples for the proposed face mask system

Full size table

$$\delta =\upvarepsilon \ast \operatorname{sign}\ \left(\nabla \ast loss\ \left(x,l\right)\right)$$

(1)

where,

l is the true label of clean image x, ∇ *loss (x, l) calculates the gradient of the loss function around the current value of the model parameters in relation to the ‘x,’ ε(epsilon) is the magnitude of the perturbation, i.e., it bounds the total number of pixels in x’ that can be modified with respect to x. The Adversarial example x^’ is computed as x^’ = x + 𝛿.

Since we have applied untargeted FGSM adversarial attacks on the proposed face mask system. The objective function of an untargeted adversarial attack is given in Equation (2):

$${\displaystyle \begin{array}{c}\ \min \left\Vert {x}^{\prime}\hbox{--} x\ \right\Vert \\ {}{\textrm{X}}^{\prime}\end{array}}$$

Subject to

$${\displaystyle \begin{array}{l}\textrm{F}\left(\textrm{x}\right)=\ell \\ {}\textrm{F}\left({\textrm{x}}^{\prime}\right)={\ell}^{\prime}\\ {}\ell \ne {\ell}^{\prime}\end{array}}$$

(2)

Here F(x) rightly classifies the input, but F(x^’) misclassifies the input as some other class. The objective function states that the magnitude of a ‘r^’ must be small and it must be small enough to make the model misclassify.

Following is the algorithm for generating the adversarial example for the face mask classifier model.

In STEP 1, we iterate through all the images in a test set for perturbation computation. After that, in STEP 1.1, The image selected is first preprocessed before being sent for prediction. The image size is resized into 224*224*3 dimensions and normalized by dividing it by 225.0. The preprocessed image is subsequently forwarded to the trained model for prediction in STEP 1.2. After the prediction, the loss between the actual label of the image and the prediction is computed in STEP 1.3. In STEP 1.4, the gradient, which is the rate of change of loss with respect to the input image, is computed. The sign operation and epsilon are then applied to the gradient to compute the perturbation IN STEP 1.5. The adversarial image is computed by adding the clean image to the perturbation in STEP 1.6. In STEP 2, the adversarial images are given as input to the face mask detection system for classification.

We have illustrated the generation of adversarial perturbation and adversarial examples in Fig. 7. An adversarial image is produced by setting the epsilon value as 0.010. Although it is clear from the figure that the adversarial image is indistinguishable from the original image, the model still misclassifies it.

The parameter epsilon in FGSM determines the magnitude of the perturbation,i.e., it represents the amount of the change in pixel values in the adversarial image. In Fig. 8, we demonstrated the adversarial image with different epsilon values. We can visualize the impact of epsilon on the clarity of the adversarial image and by increasing epsilon’s value, the adversarial images’ changes become perceptible.

4.3 Improving the robustness of the proposed face mask detection model

In this section, we designed a framework by which the models’ robustness to the adversarial attack is enhanced. We employed an adversarial learning approach in which the dataset is modified by adding adversarial examples and the model is retrained on the modified dataset. However, in our approach, we modified the adversarial learning by introducing an adversarial generator module that generates adversarial examples of the same image on different epsilons and adds them to the sophisticated face mask dataset. The conventional adversarial generator module generated only perturbations on single epsilon values. The reason for producing adversarial images with varying epsilon values is to ensure that the model is resilient to a variety of adversarial perturbations instead of a single perturbation. The framework of the robust model is shown in Fig. 9.

4.3.1 Formulation

The goal is to enhance the robustness of face mask detection model ‘F’ to adversarial attacks . We employ an adversarial learning approach by modifying the dataset say D with adversarial examples and retraining the model F on the modified dataset. We introduced an adversarial generator module ‘A’ that takes an image and an epsilon value as input and outputs an adversarial example. Conventionally, ‘A’ generates adversarial examples by adding perturbations on a single epsilon value. However, we propose a modification to ‘A’ by introducing an adversarial generator module that generates adversarial examples of the same image on different epsilon values and adds them to the dataset D.

Let ε₁, ε₂, ..., ε_n be n different epsilon values. Then, 'A' generates n different adversarial examples of the same image x, denoted as, x'₁₁, x'₁₂, ..., x'_1n corresponding to ε₁, ε₂, ..., εn, respectively. This modification aims to ensure that the model F is resilient to a variety of adversarial perturbations instead of a single perturbation. We create a modified dataset D' by adding the adversarial examples (x'₁₁, x'₁₂,, ..., x'_1n), (x_'21, x'₂₂,..., x'_2n), ..., (x'_m1, x'_m2,..., x'_mn) for each epsilon value to the original dataset D, where m is the number of randomly selected clean samples from D. That is, D' = {(x_i, y_i), (x'_ij, y_i): i ε {1,2,…,m}, j ε {1,2,…,n}} where (x_i, y_i) is the ith image-label pair in D and x'_ij is the adversarial image of the ith clean image corresponding to the epsilon value j. We retrain the model F on the modified dataset D' to enhance its robustness to adversarial attacks. Algorithm 2 outlines the steps of the robust face mask detection model.

STEP 1 iterates through a bunch of the images in the regular dataset. Instead of computing the perturbation on a single epsilon value, we iterate through a range of epsilons in STEP1.1. For example, e1,e2,.., eN are the values of the epsilons that are used to generate the different adversarial examples of same clean image. In STEP 1.1.1, the trained non-robust model predicts the output of the clean test image. STEP 1.1.2 and 1.1.3 computes the loss and the gradient, respectively. In step1.1.4, the perturbation is computed by applying the sign method to the gradient and multiplying it with the current epsilon value, i.e., perturbation = sign (gradient)* e1. STEP 1.1.5 computes the adversarial image on the current epsilon value, say e1. The generator iterates through all epsilon values to generate the adversarial images of the same clean image on different epsilon values. In Step 2, the non-robust model is retrained on a modified dataset consisting of clean and adversarial images. The resultant model is robust to adversarial attacks.

5 Experimental results

We trained the proposed face mask model on google collaboratory and then saved it on a local machine. After that, the model was loaded and the adversarial attack was executed on a local machine with 8 GB RAM and NVIDIA Ge Force 930 max. Finally, the non-robust face mask model was modified using the proposed framework and retrained on google collaboratory.

5.1 Pre attack model performance

The face mask detection model was trained for 20 epochs on 93% of the total data. The remaining 7% were used for validation/testing purposes. The model performed well and classified the images as the with-mask, without-mask, and incorrect-masked faces with an accuracy of 98.51% and 95.83% on training and testing data, respectively. The learning curves of the model are shown in Fig. 10.

The plot makes it clear that as the number of epochs increases, the training and validation accuracy increases while the loss of training and validation data decreases. It is evident from the plot that the testing and training accuracy are not far from each other, which concludes the model is not overfitting.

Several other metrics were used to assess a model’s performance, including recall, precision, F1-Score, accuracy, macro-average, and weighted average. All these metrics were computed by the classification_report method of the SK-learn package. The metrics are defined below:

Accuracy: represents the number of correctly classified data instances over the total number of data instances.
Precision: It is the proportion of correctly predicted positive observations to the total predicted positive observations.
Recall or sensitivity: It is given by the proportion of true positives to all positives.
F1-Score: It is the harmonic mean of the recall and precision, i.e.mathematically, it is computed as a weighted average of both.

$$\textrm{F}1\textrm{Score}={2}^{\ast}\left(\textrm{Recal}{\textrm{l}}^{\ast }\ \textrm{Precision}\right)/\left(\textrm{Recal}\textrm{l}+\textrm{Precision}\right)$$

At first, we computed the performance of a face mask detection model on the clean test dataset. The model was evaluated on 1053 unseen/test images, of which 353, 350 and 350 belong to incorrect-mask, with-mask and without-mask, respectively. The model achieved a precision of 95%, recall of 94%, and f1-score of 94%. The result of the clean test image is shown in Fig. 11.

The output of a face mask detection model is shown in Fig. 12. When given a clean masked face image, the model correctly predicts it as a masked image with 99.95% confidence.

5.2 Post attack model performance

The classifier achieved excellent training data accuracy and performed well on the test data. However, this was not the case once attacked by the untargeted FGSM. As discussed in Section 4, the model’s performance was evaluated on adversarial images generated on different epsilons. The reason is to compare the impact of the different epsilon on the model’s performance. It is evident from Table 4 that on increasing the value of epsilon, the model’s performance degraded. On epsilon value 0.009, the accuracy of a model was reduced to 14.53% from 95.83, which is alarming. The plot of accuracy and loss on different epsilon values can be seen in Fig. 13. It is also evident that increasing the value of epsilon resulted in a decrease in the accuracy of the model.

Table 4 Accuracy and loss of regular model on various epsilon of FGSM

Full size table

In addition, we have computed the precision, recall, and f1-score on different epsilon values. The results reinforce our conclusion that the performance of the model has degraded. The list of classification reports is shown in Fig. 14.

The adversarial image of the clean masked image created by FGSM forces the model to classify it wrongly as an incorrect mask with 100% confidence, as shown in Fig. 15. One thing to note here is that the model incorrectly labels the adversarial image and does so with 100% confidence. This confidence level exceeds the correctly classified clean image, which had a 99.95% classification confidence.

5.3 Robust model performance

The proposed robust framework is based on adversarial learning. However, due to the limited resources, including RAM, the adversarial generator selected only a random subset of the training dataset for adversarial example generation and added them to the subset. The model was then retrained on the modified dataset containing a subset of training data and their adversarial examples. Despite using only a portion of the training data, the model generalized well. Figure 16 depicts the robust model’s learning curve. There was a noticeable drop in accuracy when evaluated on the modified dataset from 95.78% to 92.79%, but the defended model performs incredibly on the images that once fooled it. The model maintained an accuracy of 92% on adversarial images computed on different epsilons. Table 5 shows the performance of the robust model on the FGSM attack with different epsilons. Compared to the regular model’s performance attack in Table 4, the robust model performed ideally on adversarial data. e.g., on epsilon 0.009, the non-robust model’s accuracy is 14.53%, whereas the robust model achieved 92% accuracy.

Table 5 Accuracy and loss of robust model on various epsilon of FGSM

Full size table

The plot of accuracy and loss of the robust model on different epsilon is shown in Fig. 17. The model performed exceptionally well even on varying epsilon of FGSM attack, maintaining its accuracy of about 92% on adversarial data.

The output of a robust model is shown in Fig. 18, where it correctly classifies the adversarial image of the masked face to its intended class, ‘with mask’. Note that the same adversarial image fooled the regular model into misclassifying it into an ‘incorrect mask’, as shown in Fig. 14.

6 Conclusion

In this study, we demonstrated the vulnerabilities of the deep learning-based face mask detection model against adversarial examples and designed a framework for a more robust face mask detection system resistant to these attacks.

We began by introducing the custom dataset used to train the face mask identification model. The model was created by applying transfer learning of the state-of-the-art classification model MobileNetV2. The model attained a respectable accuracy of 95.83% when tested on clean inputs, as shown in Figs. 10 and 11. The model was subsequently subjected to adversarial examples computed on various epsilons of the FGSM attack. The results shown in Table 4 revealed that the models’ performance declined further as the epsilon’s value increased. On epsilon 0.009, the model’s accuracy declined to 14.53%, which is alarming. Although we employed extremely small epsilon values to create imperceptible perturbations, the model misclassified the adversarial input with greater certainty. In Fig. 15, we noted that the adversarial image of the masked face was recognized as an incorrect mask face with 100% confidence. During the FGSM attack, the majority of the images in the test data were wrongly classified as incorrectly masked faces.

We employed the adversarial learning strategy to enhance the robustness of the model, which involved modifying the custom dataset with adversarial samples. However, in our defense approach, we modified the conventional adversarial learning approach by using only a portion of the total dataset and generating adversarial examples on the combination of epsilon values. The results in Table 5 and Fig. 17 demonstrate that the model is robust, achieving 92% accuracy on adversarial examples.

Face mask identification is critical for effectively combatting COVID-19. Thus, further study on vulnerabilities and defensive techniques is required before these systems can be implemented. We believe our findings will assist researchers in improving the security of their models and raise awareness of the need to establish face mask detection models with several protection strategies.

7 Future scope

In the future, we would like to work on the following areas:

The present study is based on evaluating the model’s performance on adversarial images. There is a potential to carry out adversarial attacks on the face mask surveillance systems, forcing the model to misclassify from the live video data.
In our particular scenario, we attacked the model by utilizing the white-box FGSM attack because the specifics of the model, including its gradient, architecture, and training data, were readily available and easy to access. However, in practical cases, when the model is deployed, all of these specifics are often concealed from the users. However, in that case, there is still a scope to attack the model using Black-box strategies [7].

In our work, we used the FGSM strategy for adversarial example computation due to its simplicity. However, in the future, we would like to attack the model using other attack strategies such as the Basic Iterative Method (BIM) [34] and Jacobian-based Saliency Map Attack (JSMA) [47] .
As discussed, there are other strategies to generate adversarial perturbation. We aim to design a universal framework that is robust to the adversarial examples generated by any strategies.
To evaluate the model’s performance on adversarial images, we computed the perturbation of each clean image separately and added it to its corresponding clean image. However, there is a method by which we can compute the universal perturbation [43], which is the singular perturbation matrix for all test samples.
Due to limited resources, the model was retrained on a fraction of the training data and adversarial examples during adversarial learning. Although the robust model performed well on adversarial examples, there is room for improvement by retraining the model on the complete dataset and their adversarial examples.
In the future, we aim to investigate several different defensive strategies, such as network distillation, which extracts knowledge from deep neural networks to ensure their robustness [48], and adversarial example detection during the testing stage [40].

Data availability

The datasets generated during and/or analyzed during the current study are available in the KAGGLE repository, https://www.kaggle.com/datasets/shiekhburhan/face-mask-dataset

Code availability

The code generated during and/or analyzed during the current study is available from the corresponding author on reasonable request.

Change history

19 June 2023
This article has been update to the typo in the article title ("leaning" change to "learning") and the placement of figures 17 and 18.

References

Agarwal C, Kaur I, Yadav S (2023) Hybrid CNN-SVM model for face mask detector to protect from COVID-19. In: Artificial intelligence on medical data. Springer, Singapore, pp 419–426
Chapter Google Scholar
Brown TB, Carlini N, Zhang C, Olsson C, Christiano P, Goodfellow I (2018) Unrestricted adversarial examples. arXiv preprint arXiv:1809.08352
Bu W, Xiao J, Zhou C, Yang M, Peng C (2017) A cascade framework for masked face detection. In: 2017 IEEE international conference on cybernetics and intelligent systems (CIS) and IEEE conference on robotics, automation and mechatronics (RAM). IEEE, pp 458–462
Chapter Google Scholar
Deng J, Guo J, Xue N, Zafeiriou S. Arcface-MFnet: Additive angular margin loss for deep face recognition with masked faces. In: 2021 IEEE International Conference on Computer Vision Workshop (ICCVW), pp 4342–4351
Chae S, Kwon S, Lee D (2018) Predicting infectious disease using deep learning and big data. Int J Environ Res Public Health 15(8):1596. https://doi.org/10.3390/ijerph15081596
Article PubMed PubMed Central Google Scholar
Chen X, Liu C, Li B, Lu K, Song D (2017) Targeted backdoor attacks on deep learning systems using data poisoning. arXiv preprint arXiv:1712.05526
Co KT, Muñoz-González L, de Maupeou S, Lupu EC (2019) Procedural noise adversarial examples for black-box attacks on deep convolutional networks. In: Proceedings of the 2019 ACM SIGSAC conference on computer and communications security, pp 275–289
Chapter Google Scholar
C. (2022) GitHub - chandrikadeb7/face-mask-detection: Face mask detection system based on computer vision and deep learning using OpenCV and Tensorflow/Keras. GitHub https://github.com/chandrikadeb7/Face-Mask-Detection
Google Scholar
Deng J, Dong W, Socher R, Li LJ, Li K, Fei-Fei L (2009) Imagenet: A large-scale hierarchical image database. In: 2009 IEEE Conference on Computer Vision and Pattern Recognition. IEEE, pp 248–255. https://doi.org/10.1109/CVPR.2009.5206848
Chapter Google Scholar
Xie W, Mou W, Zhao F, Feng J (2021) WIDER FACE MASK: A face mask detection benchmark. In: 2021 IEEE International Conference on Computer Vision (ICCV), pp 8828–8837
Google Scholar
Ejaz MS, Islam MR, Sifatullah M, Sarker A (2019) Implementation of principal component analysis on masked and non-masked face recognition. In: 2019 1st International Conference on Advances in Science, Engineering and Robotics Technology (ICASERT). https://doi.org/10.1109/icasert.2019.8934543
Chapter Google Scholar
Ellis R (2020) WHO changes stance, says public should wear masks. WebMD. https://www.webmd.com/lung/news/20200608/who-changes-stance-says-public-should-wear-masks
Fei J, Xia Z, Yu P, Xiao F (2020) Adversarial attacks on fingerprint liveness detection. EURASIP J Image Video Process 2020(1):1–11. https://doi.org/10.1186/s13640-020-0490-z
Article Google Scholar
Ellis R (2020) WHO changes stance, says public should wear masks. WebMD https://www.webmd.com/lung/news/20200608/who-changes-stance-says-public-should-wear-masks
Ganin Y, Ustinova E, Ajakan H, Germain P, Larochelle H, Laviolette F, Marchand M, Lempitsky V (2017) Domain-adversarial training of neural networks. In: Domain adaptation in computer vision applications, pp 189–209. https://doi.org/10.1007/978-3-319-58347-1_10
Chapter Google Scholar
Ge S, Li J, Ye Q, Luo Z (2017) Detecting masked faces in the wild with LLE-CNNs. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 2682–2690
Google Scholar
Goodfellow IJ, Shlens J, Szegedy C (2014) Explaining and harnessing adversarial examples. arXiv preprint arXiv:1412.6572
He K, Zhang X, Ren S, Sun J (2016) Deep residual learning for image recognition. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). https://doi.org/10.1109/cvpr.2016.90
Chapter Google Scholar
Hirano H, Koga K, Takemoto K (2020) Vulnerability of deep neural networks for detecting COVID-19 cases from chest X-ray images to universal adversarial attacks. PLOS ONE 15(12):e0243963. https://doi.org/10.1371/journal.pone.0243963
Article CAS PubMed PubMed Central Google Scholar
Huang GB, Mattar M, Berg T, Learned-Miller E (2008) Labeled faces in the wild: A database for studying face recognition in unconstrained environments. In: Workshop on faces in' real-life' images: detection, alignment, and recognition. Erik Learned-Miller and Andras Ferencz and Frédéric Jurie, Marseille [Online]. Available https://hal.inria.fr/inria-00321923
Google Scholar
Huang B, Wang Z, Wang G, Jiang K, He Z, Zou H, Zou Q (2021) Masked face recognition datasets and validation. In: Proceedings of the IEEE/CVF international conference on computer vision, pp 1487–1491
Google Scholar
Huang B, Wang Z, Jiang K, Zou Q, Tian X, Lu T, Han Z (2022) Joint segmentation and identification feature learning for occlusion face recognition. IEEE Trans Neural Netw Learn Syst 1–14. https://doi.org/10.1109/tnnls.2022.3171604
Hussain SA, Salim Abdallah Al Balushi A (2020) A real time face emotion classification and recognition using deep learning model. J Phys Conf Ser, 1432(1), 012087. https://doi.org/10.1088/1742-6596/1432/1/012087
Inamdar M, Mehendale N (2020) Real-time face mask identification using facemasknet deep learning network. SSRN Electronic Journal. https://doi.org/10.2139/ssrn.3663305
Jain V, Learned-Miller E (2010) FDDB: A benchmark for face detection in unconstrained settings, vol 2, no 6. UMass Amherst technical report
Jain A, Duin P, Mao J (2000) Statistical pattern recognition: a review. IEEE Trans Pattern Anal Mach Intell 22(1):4–37. https://doi.org/10.1109/34.824819
Article Google Scholar
Jamshidi M, Lalbakhsh A, Talla J, Peroutka Z, Hadjilooei F, Lalbakhsh P, Jamshidi M, Spada LL, Mirmozafari M, Dehghani M, Sabet A, Roshani S, Roshani S, Bayat-Makou N, Mohamadzade B, Malek Z, Jamshidi A, Kiani S, Hashemi-Dezaki H, Mohyuddin W (2020) Artificial intelligence and COVID-19: Deep learning approaches for diagnosis and treatment. IEEE Access 8:109581–109595. https://doi.org/10.1109/access.2020.3001973
Article PubMed Google Scholar
Ji Y, Zhang X, Ji S, Luo X, Wang T (2018) Model-reuse attacks on deep learning systems. In: Proceedings of the 2018 ACM SIGSAC Conference on Computer and Communications Security. https://doi.org/10.1145/3243734.3243757
Chapter Google Scholar
Jignesh Chowdary G, Punn NS, Sonbhadra SK, Agarwal S (2020) Face mask detection using transfer learning of inceptionV3. Big Data Anal 81–90. https://doi.org/10.1007/978-3-030-66665-1_6
Kakizaki K, Yoshida K (2019) Adversarial image translation: Unrestricted adversarial examples in face recognition systems. arXiv preprint arXiv:1905.03421
Khosravy M, Nakamura K, Hirose Y, Nitta N, Babaguchi N (2021) Model inversion attack: Analysis under gray-box scenario on deep learning based face recognition system. KSII Trans Internet Inf Syst 15(3). https://doi.org/10.3837/tiis.2021.03.015
Kingma DP, Ba J (2014) Adam: a method for stochastic optimization. arXiv preprint arXiv:1412.6980
Krizhevsky A, Sutskever I, Hinton GE (2017) ImageNet classification with deep convolutional neural networks. Commun ACM 60(6):84–90. https://doi.org/10.1145/3065386
Article Google Scholar
Kurakin A, Goodfellow IJ, Bengio S (2018) Adversarial examples in the physical world. In: Artificial intelligence safety and security. Chapman and Hall/CRC, pp 99–112
Chapter Google Scholar
Larxel’s face mask detection dataset: FMDD (2021) Accessed on 17 June 2021. [Online]. Available https://www.kaggle.com/andrewmvd/face-mask-detection
Li C, Wang R, Li J, Fei L (2019) Face detection based on YOLOv3. Recent trends in intelligent computing. Communication and Devices 277–284. https://doi.org/10.1007/978-981-13-9406-5_34
Liu W, Anguelov D, Erhan D, Szegedy C, Reed S, Fu CY, Berg AC (2016) SSD: Single shot multibox detector. Computer Vision – ECCV 2016:21–37. https://doi.org/10.1007/978-3-319-46448-0_2
Article Google Scholar
Loey M, Manogaran G, Taha MHN, Khalifa NEM (2021, February) Fighting against COVID-19: A novel deep learning model based on YOLO-v2 with ResNet-50 for medical face mask detection. Sustain Cities Soc 65:102600. https://doi.org/10.1016/j.scs.2020.102600
Article PubMed Google Scholar
Loey M, Manogaran G, Taha MHN, Khalifa NEM (2021, January) A hybrid deep transfer learning model with machine learning methods for face mask detection in the era of the COVID-19 pandemic. Measurement 167:108288. https://doi.org/10.1016/j.measurement.2020.108288
Article PubMed Google Scholar
Lu J, Issaranon T, Forsyth D (2017) SafetyNet: Detecting and rejecting adversarial examples robustly. In: 2017 IEEE International Conference on Computer Vision (ICCV). https://doi.org/10.1109/iccv.2017.56
Chapter Google Scholar
Mijwil MM (2021) Implementation of machine learning techniques for the classification of lung X-Ray images used to detect COVID-19 in humans. Iraqi J Sci:2099–2109. https://doi.org/10.24996/ijs.2021.62.6.35
Militante SV, Dionisio NV (2020) Real-time facemask recognition with alarm system using deep learning. In: 2020 11th IEEE Control and System Graduate Research Colloquium (ICSGRC). https://doi.org/10.1109/icsgrc49013.2020.9232610
Chapter Google Scholar
Moosavi-Dezfooli SM, Fawzi A, Fawzi O, Frossard P (2017) Universal adversarial perturbations. In: 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). https://doi.org/10.1109/cvpr.2017.17
Chapter Google Scholar
Nagrath P, Jain R, Madan A, Arora R, Kataria P, Hemanth J (2021, March) SSDMNV2: A real time DNN-based face mask detection system using single shot multibox detector and MobileNetV2. Sustain Cities Soc 66:102692. https://doi.org/10.1016/j.scs.2020.102692
Article PubMed Google Scholar
Nieto-Rodríguez A, Mucientes M, Brea VM (2015) System for medical mask detection in the operating room through facial attributes. Pattern Recogn Image Anal:138–145. https://doi.org/10.1007/978-3-319-19390-8_16
Pal B, Gupta D, Rashed-Al-Mahfuz M, Alyami SA, Moni MA (2021) Vulnerability in deep transfer learning models to adversarial fast gradient sign attack for COVID-19 prediction from chest radiography images. Applied Sciences 11(9):4233. https://doi.org/10.3390/app11094233
Article CAS Google Scholar
Papernot N, McDaniel P, Jha S, Fredrikson M, Celik ZB, Swami A (2016) The limitations of deep learning in adversarial settings. In: 2016 IEEE European Symposium on Security and Privacy (EuroS&P). https://doi.org/10.1109/eurosp.2016.36
Chapter Google Scholar
Papernot N, McDaniel P, Wu X, Jha S, Swami A (2016) Distillation as a defense to adversarial perturbations against deep neural networks. In: 2016 IEEE Symposium on Security and Privacy (SP). https://doi.org/10.1109/sp.2016.41
Chapter Google Scholar
Qin B, Li D (2020) Identifying facemask-wearing condition using image super-resolution with classification network to prevent COVID-19. Sensors 20(18):5236. https://doi.org/10.3390/s20185236
Article ADS CAS PubMed PubMed Central Google Scholar
Qingwei Y, Yang S, Xiaodong W, Yu Z (2013) An Improved LLE Algorithm with Sparse Constraint. J Comput Theor Nanosci 10(12):2872–2876. https://doi.org/10.1166/jctn.2013.3293
Article CAS Google Scholar
Rahman A, Hossain MS, Alrajeh NA, Alsolami F (2021) Adversarial examples—security threats to COVID-19 deep learning systems in medical IoT devices. IEEE Internet Things J 8(12):9603–9610. https://doi.org/10.1109/jiot.2020.3013710
Article PubMed Google Scholar
Redmon J, Divvala S, Girshick R, Farhadi A (2016) You only look once: unified, real-time object detection. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). https://doi.org/10.1109/cvpr.2016.91
Chapter Google Scholar
Ren K, Zheng T, Qin Z, Liu X (2020) Adversarial attacks and defenses in deep learning. Engineering 6(3):346–360. https://doi.org/10.1016/j.eng.2019.12.012
Article Google Scholar
Sandler M, Howard A, Zhu M, Zhmoginov A, Chen LC (2018) MobileNetV2: Inverted residuals and linear bottlenecks. In: 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition. https://doi.org/10.1109/cvpr.2018.00474
Chapter Google Scholar
Simonyan K, Zisserman A (2014) Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556
Szegedy C, Vanhoucke V, Ioffe S, Shlens J, Wojna Z (2016) Rethinking the inception architecture for computer vision. In: 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR). https://doi.org/10.1109/cvpr.2016.308
Chapter Google Scholar
Szegedy C, Zaremba W, Sutskever I, Bruna J, Erhan GI, Fergus R (n.d.) Intriguing properties of neural networks. arXiv 2013, arXiv:1312.6199
Viola P, Jones M (n.d.) Rapid object detection using a boosted cascade of simple features. In: Proceedings of the 2001 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, vol 2001. CVPR. https://doi.org/10.1109/cvpr.2001.990517
Wang YQ (2014) An analysis of the viola-jones face detection algorithm. Image Process Line 4:128–148. https://doi.org/10.5201/ipol.2014.104
Article Google Scholar
Wang L, Lin ZQ, Wong A (2020) COVID-Net: a tailored deep convolutional neural network design for detection of COVID-19 cases from chest X-ray images. Sci Rep 10(1). https://doi.org/10.1038/s41598-020-76550-z
Sheikh B, Zafar A (2023) RRFMDS: Rapid real-time face mask detection system for effective COVID-19 monitoring. SN Comput Sci 4:288. https://doi.org/10.1007/s42979-023-01738-9
Article PubMed PubMed Central Google Scholar
World Health Organization (2020) Advice on the use of masks in the context of COVID-19: Interim guidance, 6 April 2020 (No. WHO/2019-nCov/IPC_Masks/2020.3). World Health Organization
Google Scholar
Yang S, Luo P, Loy CC, Tang X (2016) Wider face: a face detection benchmark. In: Proceedings of the IEEE conference on computer vision and pattern recognition, pp 5525–5533
Google Scholar

Download references

Author information

Authors and Affiliations

Department of computer science, Aligarh Muslim University, Aligarh, Uttar Pradesh, 202002, India
Burhan Ul haque Sheikh & Aasim Zafar

Authors

Burhan Ul haque Sheikh
View author publications
You can also search for this author in PubMed Google Scholar
Aasim Zafar
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Burhan Ul haque Sheikh.

Ethics declarations

Conflict of interest

The authors declare that there is no conflict of interest.

Additional information

Publisher’s note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Sheikh, B.U.h., Zafar, A. Untargeted white-box adversarial attack to break into deep learning based COVID-19 monitoring face mask detection system. Multimed Tools Appl 83, 23873–23899 (2024). https://doi.org/10.1007/s11042-023-15405-x

Download citation

Received: 01 July 2022
Revised: 17 September 2022
Accepted: 18 April 2023
Published: 05 May 2023
Issue Date: March 2024
DOI: https://doi.org/10.1007/s11042-023-15405-x

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Untargeted white-box adversarial attack to break into deep learning based COVID-19 monitoring face mask detection system

Abstract

Similar content being viewed by others

Beyond accuracy and precision: a robust deep learning framework to enhance the resilience of face mask detection models against adversarial attacks

Face mask detection in COVID-19: a strategic review

Performance Analysis of ASUS Tinker and MobileNetV2 in Face Mask Detection on Different Datasets

1 Introduction

1.1 Basic concepts

1.2 Contribution

1.3 Paper organization

2 Related work

2.1 Face mask detection models

2.1.1 CNN-based approaches

2.1.2 Hybrid-based approaches

2.2 Adversarial attack on COVID-19 monitoring models

3 Dataset description

4 Methodology

4.1 Transfer learning of MobileNetV2

4.2 Untargeted FGSM (fast gradient sign method) attack on the proposed model

4.3 Improving the robustness of the proposed face mask detection model

4.3.1 Formulation

5 Experimental results

5.1 Pre attack model performance

5.2 Post attack model performance

5.3 Robust model performance

6 Conclusion

7 Future scope

Data availability

Code availability

Change history

19 June 2023

References

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of interest

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation