1 Introduction

The Covid-19 epidemic has caused devastation to healthcare facilities and treatment systems in every country. The virus is disseminated by direct contact with contaminated respiratory droplets (produced via sneezing and coughing). Anyone who comes into contact with virus-infected surfaces and touches their faces might get sick, with symptoms such as shortness of breath, cough, and fever. In crowded and public locations, the WHO recommends wearing a face mask to minimize viral transmission via the nose or oral passages [12, 14, 62]. The government of COVID-19-affected nations has passed laws mandating the wearing of face masks. As a result, large groups of individuals in public spaces and congested locations must be monitored for face mask violations. Conventional procedures for checking a face mask violation are not always feasible and are error-prone.

Computing-assisted deep neural network (DNN) techniques have made significant progress in the recent decade, showing promise in a variety of classification tasks [33, 55] and pattern recognition [26]. In the fight against the COVID-19 epidemic, artificial intelligence, and more specifically machine or deep learning, has proved to be a successful and crucial tool. It has proven to be beneficial in early predicting infection by the analysis of the previous data of the patients [5]. In addition, deep learning models have been proposed to detect COVID-19 from the readily available chest X-Ray or CT images [27]. As discussed earlier, wearing a mask is the most effective way to stop the rapid spread of COVID-19. However, ensuring that masks are worn properly in public or crowded places is difficult with conventional approaches, which rely on human monitoring. Fortunately, artificial intelligence has provided a solution with computer-aided face mask detection systems. In computer-aided face mask detection systems, methodologies primarily based on deep learning have been applied to produce rapid, accurate, and trustworthy systems. A large number of researchers have developed a variety of DNN architectures to detect face mask violations. Most of these researchers have based their methods on transfer learning [61], while some have developed novel architectures. DL models have shown high sensitivity and specificity when detecting face mask violations.

Deep learning models need a significant amount of data to train a model effectively in the training phase. When given a new unseen input sample, the deep learning model infers the corresponding output during the inferencing phase. For example, in the case of the face mask detection model, when the model is trained on the dataset, the model is then given unseen data and returns the classification result, i.e., with mask, no mask, or incorrect mask. Deep learning models have various confidential and substantial assets in deep learning models, including training data, trained models and model parameters. However, recent research has proven that these assets are vulnerable to various attacks during the training and inferencing phase of the model, allowing attackers to undermine the model’s security and privacy. These attacks on deep learning algorithms are broadly categorized into four types: Adversarial attack, Data poisoning attack, Model Inversion attack and model extraction attack. Data poisoning is performed before the model’s training process [6]. The training data is contaminated using various techniques, including adding mislabeled data or triggering patches to the input data. It aims to force the model to misclassify the input. Model inversion attack concerns the confidentiality of the data used for the model’s training [31]. In a model extraction attack [28], the attacker tries to get the internal confidential details of the model, such as parameters,hyper-parameters, and the model’s architecture. During an adversarial attack, input data samples are added with the modest, skillfully crafted perturbation that causes the model to misclassify during inferencing [53]. In computer vision, the adversarial attack is primarily applied to the classification models during the inferencing phase. In Fig. 1, we have demonstrated various attacks on deep learning models in their respective phases.

Fig. 1
figure 1

It demonstrates the various attacks on the deep learning model’s assets

The presence of the adversarial attack calls into doubt the widespread use of deep learning models for face mask recognition. Especially in insecure situations, the prevalence of this adversarial approach raises concerns regarding the widespread deployment of deep learning models for face mask identification. Since various face mask detection systems based on deep learning have been proposed so far, none of them have discussed the possible vulnerabilities in the system and how to mitigate these weaknesses. This study reveals the vulnerabilities of face mask detection systems based on conventional deep learning and proposes a comparatively more robust system against adversarial attacks. As a part of this research, first, we developed a custom dataset for training the proposed face mask detection model. The model is based on transfer learning of the state-of-the-art classification model MobileNetV2 [54] and can recognize and classify the images into three classes, including an incorrect face mask, with mask and no mask, with excellent accuracy of 96%. After that, we assessed the commensurate vulnerabilities of the developed model against the adversarial examples. In particular, we attacked the model by the Un-targeted Fast Gradient Sign Method (FGSM) [17]. The adversarial samples were computed on different, very small epsilon parameter settings of FGSM and the model’s performance was evaluated. The resultant adversarial image looks imperceptible and similar to the original image. These adversarial input images forced the face mask model to misclassify with high and alarming confidence. The result shows that the attack significantly degrades the performance of a model. In order to enhance the model's resilience against adversarial attacks, we implemented an adversarial training strategy with slight alterations [15]. Specifically, instead of retraining the model on the entire dataset, we trained it on a randomly chosen subset of the training data, along with their respective adversarial images. The outcome of this modified training approach demonstrated an improvement in the model's performance on adversarial examples.

1.1 Basic concepts

we have discussed some essential terminologies related to attacks on deep learning-based face mask detection models. The attacks on the system can be classified according to whether the attacker has access to the model; in that case, we have two types of attacks: white box and black-box attacks. Additionally, the attacks can be classified according to whether the model is compelled to produce a particular label or not. As a result, we have two types of assaults: targeted and untargeted.

  • White box Attack Approach: In this case, the attacker has full access to the system, say ‘F’ or the entire information of a model, including the gradient of the model, model parameters, hyperparameters, and training dataset. The adversary is provided with the classifier ‘F’ in the white-box threat model. Gradient descent on the adversarial loss y(·), or an approximation thereof is a powerful attack approach in this scenario. To ensure that the changes remain undetectable, one can regulate the perturbation norm ‖δ‖2 either by stopping the loss optimization early or by incorporating the norm directly as a regularize or constraint into the loss optimization.

  • Black box Attack Approach: The attacker only partially knows a model. Gradient parameters and training sets might not be accessible or known to the attacker. Arguably, many white-box assumptions are impractical in many real-world situations. For example, the model ‘h’ may be accessible to the public via an Application Programming Interface(API) that allows only input queries, obviating the attacker of access to the model’s details.

  • Targeted Attack Approach: The face mask detection system is compelled to output a certain/target label. For example, given an input image of a person wearing a mask, the model may be compelled to output an unmasked face.

  • Untargeted Attack Approach: It is entirely focused on the misclassification of a model, regardless of the output.

1.2 Contribution

  • The custom face mask dataset was developed and made publicly available on a KAGGLE repository (https://www.kaggle.com/datasets/shiekhburhan/face-mask-dataset).

  • We developed the deep learning-based face mask detection model using the conventional approach. The model is based on transfer learning of MobileNetv2 and performed well on the unseen clean data samples with an accuracy of 95.85%

  • We demonstrated that the conventional model is susceptible to adversarial attacks by exposing it to the FGSM attack.

  • We proposed the framework for a robust face mask detection model that is resistant to adversarial attacks such as FGSM.

1.3 Paper organization

Following the introduction, Section 2 reviews the literature on face mask detection and adversarial attacks. The proposed dataset and comparison with other standard datasets are described in Section 3. The methodology, which is broken down into three parts, training a face mask classifier, attacking it with FGSM, and improving the robustness of the face mask model, is described in Section 4. Section 5 discusses the Experimental results and outputs in detail. Section 6 deals with the conclusion. Finally, the paper is concluded with the future scope of the study.

2 Related work

The literature review has been split into two sections. First, we will go through the existing literature on face mask detection systems, and then we will discuss the literature on adversarial attacks on COVID-19 monitoring systems, including face mask detection systems.

2.1 Face mask detection models

Due to the ongoing epidemic, there has been a lot of interest in projects with similar purposes. Most researchers utilize convolutional neural networks (CNN) from all the approaches mentioned in the literature because of their outstanding performance and capacity to extract valuable characteristics from the image data. Other methods have employed hybrid strategies that use ML methodologies with or without deep learning.

2.1.1 CNN-based approaches

Contrary to ML approaches, we do not need to manually extract the features in CNN-based methods. CNN uses convolution and pooling techniques to extract valuable features from the input. We have discussed some popular CNN-based face mask detection models in the following.

In [16], the MAFA or Masked Faces face mask dataset was initially produced. They built a CNN model capable of detecting facial occlusion, including masks. They divided their concept into three key components: the proposal module, the embedding module, and the verification module. The initial module combines two CNNs and retrieves facial image characteristics. The second module focuses on detecting facial landmarks that are not obscured by occlusion. The LLE algorithm is implemented at this stage [50]. In the final module, classification and regression tasks are carried out using a CNN to determine if an item is a face and to scale the position of missing facial signals. Identifying side-facing faces degraded the model’s performance, and the dataset contains more occluded than masked faces. Therefore, training with this dataset is not always viable for face mask identification alone. The performance was determined by calculating the precision of each parameter and averaging the precision of various parameters. Recorded precision averaged 74.6%.

In [8], similar to our model, they used MobileNetV2 to classify the face mask and Caffe-based face detector. A small dataset of 4095 images was used. Additionally, the data set has only two classes, masked and unmasked. Hence, the model trained cannot detect the incorrectly masked faces (i.e., having his or her mask below the nose). It achieved a descent f1-score of 0.93.

In [3], a dataset known as “MASKED FACE DATASET” was proposed and three CNN architectures were cascaded for face mask detection. The dataset only consists of 200 images. In order to overcome the problem of overfitting, they used the concept of transfer learning and fine-tuned the model with the WiderFace dataset [63]. The first CNN consists of five layers and is used to scale the input image. The second and third layers consist of seven layers each. The advantage of using three cascaded CNN is that each false detection is eliminated, thus making a prediction stronger. However, using three CNN makes it computationally expensive. The model achieved an accuracy of 86.6% and recall of an 87.8% on their proposed dataset.

In [49], the authors presented the SRCNet model for face mask detection. The model comprises two networks: a classification network and an image super-resolution (SR) network. The model is capable of adequately classifying Incorrect Facemask-Wearing (IFW), Correct Facemask-Wearing (CFW), and No FaceMask-Wearing (NFW). The model was trained using the MMD or Medical Masked dataset [1] and the MobileNetV2 CNN algorithm was adopted. The design was well-organized and effective. However, the dataset used for training was very small and the face mask detection speed was plodding than other algorithms. The model achieved an accuracy of 98.07%.

The approach in [24] identifies three classification categories: no-mask, improper-face mask, and with-mask. The model was trained on a dataset consisting of 35 masked and unmasked face images. Before training, the dataset was first preprocessed and scaled to the necessary dimensions. The model first identifies the face, then extracts the face from the input, and then applies the face masknet model for classification. It includes extremely limited and regionally specific data. The accuracy of the model was reported to be 98.06%.

In [38], they proposed the novel face mask detection technique using YOLOv2 [52] and ResNet50 [18] together. They used the FMD [35] and MMD [1] datasets to train and test a model. SGDM and ADAM optimizers were used to compare the performance [32]. The model achieved an average precision of 81%.

In [23], the VGG16 architecture was utilized to identify and categorize face expressions [55]. The accuracy of their VGG16 model trained on the KDEF database is 88%.

The transfer learning of the InceptionV3 model [56] was used in [29]. The last layer of the model was removed and five new trainable layers were added. The last layer consists of 2 neurons followed by a softmax activation function where each neuron corresponds to a masked face and unmasked face, respectively. The model obtained an accuracy of 99.91% training and 100% testing accuracy in 80 epochs.

In [42], VGG-16 CNN was used for face mask detection. The dataset they developed consists of 25,000 images, and the model was trained on it. The mask-covered area in an image was first segmented and extracted. The proposed model used the Adam optimizer as an optimization function. Their algorithm was 96% accurate at spotting face masks.

The SSDMNV2 model is proposed in [44]. They used a similar approach to ours—for face detection, they utilized a single shot multi-box detector [37] and MobileNetV2 is used for classification. The classification accuracy was around 92% and the F1 score was 0.93. Our proposed system outperforms it with 98.6% training and 97% testing accuracy and a 0.95 F1 score.

In [36], for face detection, YOLOv3 was used. It was trained on celebi and wider-face [63] databases. The model was later evaluated on the FDDB database [25] and achieved an accuracy of 93.9%.

2.1.2 Hybrid-based approaches

The algorithms for deep learning and machine learning were combined in [39]. The deep learning model ResNet50 was employed for feature extraction, while machine learning methods such as Support Vector Machines and Decision Tree algorithms were used for classification. One of the four types of datasets contains both actual and fake face masks. On the training dataset containing actual face masks, the decision trees classifier did not obtain a decent classification accuracy (68%) on false face masks.

In [45], they proposed a model that triggers an alarm for surgical face mask violation in the operating room for face detection. They used Viola-jones face detection and LogitBoost for face mask detection [59]. One of the problems with the model was that it would make a mistake if clothing was found near the face. Synthetic rotation was used to find a solution to this problem. In addition, the model was trained only on surgical facemasks. The recall was said to be above 95%, and the rate of false positives was less than 5%.

In [4], a haar-cascade-based feature detector was utilized to recognize a nose and mouth from the detected face [58]. The model identifies the nose and mouth and predicts an unmasked face. If it detects only the nose, it predicts an incorrectly masked face and a correctly masked face if neither is detected. This method is quick and straightforward, but it can only interpret full-frontal faces and can be tricked by covering the mouth and nose. Our proposed model is able to predict correctly from different orientations of a face and occlusion, such as a hand on a face, hair on a face etc.

Principal Component Analysis (PCA) algorithm was implemented in [11] for face mask violations. It performed well with an accuracy of 96.25 for detecting the faces without the mask, but while detecting the faces with a mask, the performance was reduced to 68.75%.

2.2 Adversarial attack on COVID-19 monitoring models

Since it was claimed in [57] that adversarial techniques might attack neural networks, the study of such techniques has become a hot topic in the field of artificial intelligence, with researchers continually proposing novel adversarial attack methods and mitigation techniques. A paucity of research has been conducted up until this point to address the challenges posed by adversarial attacks on computer-based COVID-19 combating technologies.

In [19], they conducted both targeted and untargeted universal adversarial perturbation (UAP) attacks against the COVID-Net architecture [60]. COVID-Net is a model developed specifically for classifying COVID-19 patients based on chest X-ray pictures. The outcome revealed that the COVID-net is susceptible to tiny UAPs. In particular, 2% of the UPAs to the average norm of image perturbations for face recognition systems using the image translation techniques achieve success rates of >85% and > 90% for nontargeted and targeted attacks, respectively. Due to nontargeted UAPs, most chest X-ray pictures are classified as COVID-19 instances by the DNN models. The targeted UAPs cause DNN models to classify most chest X-ray pictures into a specific target class.

In [51], they studied six different applications used for COVID-19 diagnosis and each adversarial application attack was proposed. They attacked the following applications: 1) recognize whether a subject is wearing a mask from a live camera feed; 2) maintain DL-based QR codes as immunization certificates; 3) add explainability of GRAD-CAM DL algorithms; 4) recognize COVID-19 from CT scan images; 5) detect noninvasive biometrics and identify social distancing from a live camera feed; and 6) recognize COVID-19 from X-ray image analysis. They tested the existing adversarial methods in this study, including FGSM, MI-FGSM, Deepfool, L-BFGS, C&W, BIM, Foolbox, PGD, and JSMA [13].

Using image translation methods, the authors of [30] calculated adversarial perturbation [2] for face recognition systems. To fool the targeted face recognition system, they would take a source image and distort it to create any desired facial appearance. White-box testing showed a 90% success rate for the attack. With the help of adversarial instances and the dynamic distillation technique, the model was able to reach an 80% success rate in the black-box setting. The translated images were realistic to the human eye and kept the person’s identity intact, while the perturbations were significant enough to fool trained defenses.

In [46], they proposed a COVID-19 diagnosis model based on transfer learning. The models were based on the state-of-the-art classification models VGG and the inception model. Models were developed using VGG and the inception model, two of the most advanced classification methods currently available. When the perturbation was increased from 0.0001 to 0.09, the VGG16 model’s accuracy plummeted by more than 90% for X-ray images, while the Inception-v3 network’s accuracy declined by 30%. The FGSM attack similarly exposed vulnerabilities, such as misdiagnosis, in CT imaging. They also demonstrate that the degree of disturbance has a sizable impact on the degree to which people are aware of attacks.

In [41], machine learning techniques including support vector machine, random forest, logistic regression naïve bayes were implemented to classify chest X-ray images as viral pneumonia, COVID-19 and healthy persons. The model used 1400 images that were collected from the Kaggle public repository. The experimental outcomes of this study confirmed that the supported vector machine technique has high accuracy and excellent performance in the classification of the disease, as reflected by values of 91.8% accuracy, 91.7% sensitivity, 95.9% specificity, 91.8% F1-score, and 97.6% AUC.

3 Dataset description

We uploaded a new dataset called the Sophisticated Face Mask Dataset on the Kaggle public repository on 16 June 2022. This dataset contains many different types of images of people wearing face masks. We collected these images from many different places so that the dataset would be unbiased and diverse. We used some images from other popular datasets like Masked FAces (MAFA) 16 and Masked Face Detection Dataset (MFDD), as well as some simulated images to make sure our dataset was complete. We also included some pictures that we found on the internet. Our goal was to create a dataset that could help train a model to recognize different types of face masks from different angles. The model we train with this dataset can categorize images into three groups: a mask on the face, an incorrectly worn mask, or no mask at all. The organization of the data in the dataset can be seen in Fig. 2.

Fig. 2
figure 2

Dataset’s organization structure in a graphical format

The dataset contains 14,535 images categorized into three primary categories: with mask(4,789), incorrectly masked(5,000), and without mask(4746). Each category has subcategories based on the type of image, which makes the dataset useful for other computer vision tasks such as face recognition and occlusion face detection [22]. The images include various depictions of people wearing masks, including those with masks incorrectly placed on the chin or covering only the mouth, simple and complex masks with different designs, and images of people not wearing masks with occlusions like beards, long hair, or hands covering their faces. some of the data sample in the dataset are shown in Fig. 3. The dataset's data distribution can be seen in the Fig. 4

Some of the data samples in the dataset are shown in Fig. 3.

Fig. 3
figure 3

Example of images in the dataset

The dataset has a total of 14,535 images. The incorrect_masked class consists of 5000 images, of which 2500 are Mask_on_Chin and 2500 are Mask_on _Mouth_Chin. The With_mask class has 4789 images, of which 4000 are simple with_mask and 789 are complex with_mask images. Similarly, without_mask has 4746 images, of which 4000 are simple and 746 are complex images. The data distribution of each class can be seen in Fig. 4.

Fig. 4
figure 4

Data Distribution (a) Distribution of each class. (b) Mask-on chin and Mask-on-face-chin images in the incorrectly masked images. (c) Distribution of simple and complex images in correctly masked images. (d) Distribution of the Unmasked images

In addition, the proposed dataset has been compared to the standard datasets that have been regularly used for face mask identification algorithms in Table 1.

Table 1 Comparison of various standard facemask dataset

4 Methodology

This study develops a face mask detection model based on transfer learning of MobileNetV2, attacks the face mask model by untargeted FGSM adversarial attack and proposes a framework for a robust face mask detection model that is resistant to adversarial attacks. The proposed face mask model is initially trained on the custom compiled dataset for effective and efficient training. For the proposed model, adversarial images are created by computing the perturbation for each test image using the FGSM technique and combining it with the corresponding clean image. The non-robust face mask model is then modified by using the proposed framework.

4.1 Transfer learning of MobileNetV2

MoblineNetV2 is a State-of-the-art CNN mode that performs well on devices such as mobile phones. The layers have been trained on the ImageNet dataset [9] and attained an optimal set of values. It is fast, accurate, and lightweight (resources efficient) when dealing with resource-constrained and real-time systems. The face mask detection system is incorporated in surveillance systems with limited resources, such as CCTV cameras. Therefore, it is the perfect model for the proposed method. MobileNetV2 employs a dual-block structure. One is a residual block with a stride of 1 and the other is of stride 2 that is used for downsampling. Both blocks have three layers each. The first layer is 1 × 1 convolution with linearity. The second layer is the depthwise convolution. The third layer is another 1 × 1 convolution but without any non-linearity.

For the face mask detection model, we removed the last layer of the MobileNetV2 and added four trainable layers, including Dense 128, Dense 62, Dense 32, and Dense 3. We also used the dropout layers between these layers to avoid overfitting. MobileNetV2 layers are kept frozen so that they are not trainable during the training of the face mask model. The last layer of the model consists of 3 neurons, each corresponding to the required class/output. Transfer learning of MobileNetV2 allowed us to save significant computational costs while improving the result. Figure 5 illustrates the architecture of the face mask detection model.

Fig. 5
figure 5

Face mask Detection model architecture based on transfer learning of MobileNetV2

During the model training, we used 93% of the complete data and 7% for testing a model. The model classifies an image with an accuracy of 98.51% on training data and 95.83% on testing data. During the training, the parameters and hyperparameter used are discussed in Table 2. The learning rate controls how fast the network weights are updated. The batch size controls the number of samples before the network updates its parameters. The number of epochs controls the number of training iterations. The dropout rate determines the percentage of parameters that are dropped while training to prevent over-fitting.

Table 2 Face mask detection model parameters and hyperparameters

4.2 Untargeted FGSM (fast gradient sign method) attack on the proposed model

We first provide notations and then introduce the formulation of an adversarial attack on the proposed face mask detection system.

Notations: Consider F as a face detection model, x as original input, l as the desired output of a model or class of input, r as a perturbation, t as the target class, and θ as parameters of the face mask detection model.

x ∈ Sample from the dataset.

l ∈ With-mask or without-mask or incorrect-mask.

Formulation: Consider x a sample from the sophisticated face mask dataset, for instance, an image with no-mask on a face. We compute perturbation r of . Adding this perturbation r to the original image x is known as an adversarial example i.e.

x ‘= xr is an adversarial example.

Note: x should be imperceptible to humans, i.e., it should appear to humans as class l, i.e., no-mask. Given this x as input to the F, the outcome of F is other than no-mask.

The adversarial attack against the proposed model is carried out by computing the adversarial perturbation for each test image and adding it to the corresponding clean image. We employed the FGSM strategy for adversarial perturbation computation due to its simplicity and attacking efficiency. The framework of the adversarial attack on the face mask model is shown in Fig. 6. The TensorFlow methods and classes used to generate adversarial attacks are discussed in Table 3. FGSM is a white-box attack approach in which gradients are computed with respect to the pixels, and then sign operation is applied to the gradient matrix to get the desired result [54]. Back-propagation is used to calculate the gradient, which determines the perturbation direction. Perturbation of image is accomplished by taking a single large step in increasing the classifier’s loss. Simply stated, for each pixel in an input image, what would happen to the target output neuron if the value of that pixel increased or decreased. As the last step, a small value ‘ε’ epsilon is multiplied by the threshold gradient matrix and then add it to the input image. Perturbation is given by the Equation (1):

Fig. 6
figure 6

It demonstrates the adversarial attack on the proposed face mask model

Table 3 Tensorflow methods used for the generation of adversarial examples for the proposed face mask system
$$\delta =\upvarepsilon \ast \operatorname{sign}\ \left(\nabla \ast loss\ \left(x,l\right)\right)$$
(1)

where,

l is the true label of clean image x, ∇ *loss (xl) calculates the gradient of the loss function around the current value of the model parameters in relation to the ‘x,’ ε(epsilon) is the magnitude of the perturbation, i.e., it bounds the total number of pixels in x’ that can be modified with respect to x. The Adversarial example x is computed as x = x + 𝛿.

Since we have applied untargeted FGSM adversarial attacks on the proposed face mask system. The objective function of an untargeted adversarial attack is given in Equation (2):

$${\displaystyle \begin{array}{c}\ \min \left\Vert {x}^{\prime}\hbox{--} x\ \right\Vert \\ {}{\textrm{X}}^{\prime}\end{array}}$$

Subject to

$${\displaystyle \begin{array}{l}\textrm{F}\left(\textrm{x}\right)=\ell \\ {}\textrm{F}\left({\textrm{x}}^{\prime}\right)={\ell}^{\prime}\\ {}\ell \ne {\ell}^{\prime}\end{array}}$$
(2)

Here F(x) rightly classifies the input, but F(x) misclassifies the input as some other class. The objective function states that the magnitude of a ‘r must be small and it must be small enough to make the model misclassify.

Following is the algorithm for generating the adversarial example for the face mask classifier model.

Algorithm 1
figure a

Adversarial examples for face mask detection model

In STEP 1, we iterate through all the images in a test set for perturbation computation. After that, in STEP 1.1, The image selected is first preprocessed before being sent for prediction. The image size is resized into 224*224*3 dimensions and normalized by dividing it by 225.0. The preprocessed image is subsequently forwarded to the trained model for prediction in STEP 1.2. After the prediction, the loss between the actual label of the image and the prediction is computed in STEP 1.3. In STEP 1.4, the gradient, which is the rate of change of loss with respect to the input image, is computed. The sign operation and epsilon are then applied to the gradient to compute the perturbation IN STEP 1.5. The adversarial image is computed by adding the clean image to the perturbation in STEP 1.6. In STEP 2, the adversarial images are given as input to the face mask detection system for classification.

We have illustrated the generation of adversarial perturbation and adversarial examples in Fig. 7. An adversarial image is produced by setting the epsilon value as 0.010. Although it is clear from the figure that the adversarial image is indistinguishable from the original image, the model still misclassifies it.

Fig. 7
figure 7

Illustration of the adversarial example generation

The parameter epsilon in FGSM determines the magnitude of the perturbation,i.e., it represents the amount of the change in pixel values in the adversarial image. In Fig. 8, we demonstrated the adversarial image with different epsilon values. We can visualize the impact of epsilon on the clarity of the adversarial image and by increasing epsilon’s value, the adversarial images’ changes become perceptible.

Fig. 8
figure 8

It illustrates the adversarial image computed on different epsilon values. a epsilon = 0.00, b epsilon = 0.090, c epsilon = 0.0150, d epsilon = 0.020

4.3 Improving the robustness of the proposed face mask detection model

In this section, we designed a framework by which the models’ robustness to the adversarial attack is enhanced. We employed an adversarial learning approach in which the dataset is modified by adding adversarial examples and the model is retrained on the modified dataset. However, in our approach, we modified the adversarial learning by introducing an adversarial generator module that generates adversarial examples of the same image on different epsilons and adds them to the sophisticated face mask dataset. The conventional adversarial generator module generated only perturbations on single epsilon values. The reason for producing adversarial images with varying epsilon values is to ensure that the model is resilient to a variety of adversarial perturbations instead of a single perturbation. The framework of the robust model is shown in Fig. 9.

Fig. 9
figure 9

Framework of the robust face mask detection system

4.3.1 Formulation

The goal is to enhance the robustness of face mask detection model ‘F’ to adversarial attacks . We employ an adversarial learning approach by modifying the dataset say D with adversarial examples and retraining the model F on the modified dataset. We introduced an adversarial generator module ‘A’ that takes an image and an epsilon value as input and outputs an adversarial example. Conventionally, ‘A’ generates adversarial examples by adding perturbations on a single epsilon value. However, we propose a modification to ‘A’ by introducing an adversarial generator module that generates adversarial examples of the same image on different epsilon values and adds them to the dataset D.

Let ε1, ε2, ..., εn be n different epsilon values. Then, 'A' generates n different adversarial examples of the same image x, denoted as, x'11, x'12, ..., x'1n corresponding to ε1, ε2, ..., εn, respectively. This modification aims to ensure that the model F is resilient to a variety of adversarial perturbations instead of a single perturbation. We create a modified dataset D' by adding the adversarial examples (x'11, x'12,, ..., x'1n), (x'21, x'22,..., x'2n), ..., (x'm1, x'm2,..., x'mn) for each epsilon value to the original dataset D, where m is the number of randomly selected clean samples from D. That is, D' = {(xi, yi), (x'ij, yi): i ε {1,2,…,m}, j ε {1,2,…,n}} where (xi, yi) is the ith image-label pair in D and x'ij is the adversarial image of the ith clean image corresponding to the epsilon value j. We retrain the model F on the modified dataset D' to enhance its robustness to adversarial attacks. Algorithm 2 outlines the steps of the robust face mask detection model.

Algorithm 2
figure b

Robust face mask detection model

STEP 1 iterates through a bunch of the images in the regular dataset. Instead of computing the perturbation on a single epsilon value, we iterate through a range of epsilons in STEP1.1. For example, e1,e2,.., eN are the values of the epsilons that are used to generate the different adversarial examples of same clean image. In STEP 1.1.1, the trained non-robust model predicts the output of the clean test image. STEP 1.1.2 and 1.1.3 computes the loss and the gradient, respectively. In step1.1.4, the perturbation is computed by applying the sign method to the gradient and multiplying it with the current epsilon value, i.e., perturbation = sign (gradient)* e1. STEP 1.1.5 computes the adversarial image on the current epsilon value, say e1. The generator iterates through all epsilon values to generate the adversarial images of the same clean image on different epsilon values. In Step 2, the non-robust model is retrained on a modified dataset consisting of clean and adversarial images. The resultant model is robust to adversarial attacks.

5 Experimental results

We trained the proposed face mask model on google collaboratory and then saved it on a local machine. After that, the model was loaded and the adversarial attack was executed on a local machine with 8 GB RAM and NVIDIA Ge Force 930 max. Finally, the non-robust face mask model was modified using the proposed framework and retrained on google collaboratory.

5.1 Pre attack model performance

The face mask detection model was trained for 20 epochs on 93% of the total data. The remaining 7% were used for validation/testing purposes. The model performed well and classified the images as the with-mask, without-mask, and incorrect-masked faces with an accuracy of 98.51% and 95.83% on training and testing data, respectively. The learning curves of the model are shown in Fig. 10.

Fig. 10
figure 10

Learning curves of models accuracy and loss: (a) is the model’s accuracy on training and testing data. (b) illustrates the loss of the model on training and testing data

The plot makes it clear that as the number of epochs increases, the training and validation accuracy increases while the loss of training and validation data decreases. It is evident from the plot that the testing and training accuracy are not far from each other, which concludes the model is not overfitting.

Several other metrics were used to assess a model’s performance, including recall, precision, F1-Score, accuracy, macro-average, and weighted average. All these metrics were computed by the classification_report method of the SK-learn package. The metrics are defined below:

  • Accuracy: represents the number of correctly classified data instances over the total number of data instances.

  • Precision: It is the proportion of correctly predicted positive observations to the total predicted positive observations.

  • Recall or sensitivity: It is given by the proportion of true positives to all positives.

  • F1-Score: It is the harmonic mean of the recall and precision, i.e.mathematically, it is computed as a weighted average of both.

    $$\textrm{F}1\textrm{Score}={2}^{\ast}\left(\textrm{Recal}{\textrm{l}}^{\ast }\ \textrm{Precision}\right)/\left(\textrm{Recal}\textrm{l}+\textrm{Precision}\right)$$

At first, we computed the performance of a face mask detection model on the clean test dataset. The model was evaluated on 1053 unseen/test images, of which 353, 350 and 350 belong to incorrect-mask, with-mask and without-mask, respectively. The model achieved a precision of 95%, recall of 94%, and f1-score of 94%. The result of the clean test image is shown in Fig. 11.

Fig. 11
figure 11

Precision, recall, f1-score of the face mask detection model on clean test data

The output of a face mask detection model is shown in Fig. 12. When given a clean masked face image, the model correctly predicts it as a masked image with 99.95% confidence.

Fig. 12
figure 12

The model correctly classifies the image as masked

5.2 Post attack model performance

The classifier achieved excellent training data accuracy and performed well on the test data. However, this was not the case once attacked by the untargeted FGSM. As discussed in Section 4, the model’s performance was evaluated on adversarial images generated on different epsilons. The reason is to compare the impact of the different epsilon on the model’s performance. It is evident from Table 4 that on increasing the value of epsilon, the model’s performance degraded. On epsilon value 0.009, the accuracy of a model was reduced to 14.53% from 95.83, which is alarming. The plot of accuracy and loss on different epsilon values can be seen in Fig. 13. It is also evident that increasing the value of epsilon resulted in a decrease in the accuracy of the model.

Table 4 Accuracy and loss of regular model on various epsilon of FGSM
Fig. 13
figure 13

Accuracy and loss curves on various epsilon values

In addition, we have computed the precision, recall, and f1-score on different epsilon values. The results reinforce our conclusion that the performance of the model has degraded. The list of classification reports is shown in Fig. 14.

Fig. 14
figure 14

Show the accuracy, precision, recall, and f1-score on different epsilon values

The adversarial image of the clean masked image created by FGSM forces the model to classify it wrongly as an incorrect mask with 100% confidence, as shown in Fig. 15. One thing to note here is that the model incorrectly labels the adversarial image and does so with 100% confidence. This confidence level exceeds the correctly classified clean image, which had a 99.95% classification confidence.

Fig. 15
figure 15

Model misclassifies an adversarial image as an ‘incorrect mask’ with 100% confidence

5.3 Robust model performance

The proposed robust framework is based on adversarial learning. However, due to the limited resources, including RAM, the adversarial generator selected only a random subset of the training dataset for adversarial example generation and added them to the subset. The model was then retrained on the modified dataset containing a subset of training data and their adversarial examples. Despite using only a portion of the training data, the model generalized well. Figure 16 depicts the robust model’s learning curve. There was a noticeable drop in accuracy when evaluated on the modified dataset from 95.78% to 92.79%, but the defended model performs incredibly on the images that once fooled it. The model maintained an accuracy of 92% on adversarial images computed on different epsilons. Table 5 shows the performance of the robust model on the FGSM attack with different epsilons. Compared to the regular model’s performance attack in Table 4, the robust model performed ideally on adversarial data. e.g., on epsilon 0.009, the non-robust model’s accuracy is 14.53%, whereas the robust model achieved 92% accuracy.

Fig. 16
figure 16

a illustrates the training accuracy vs. the validation accuracy of the robust model on a modified dataset; b shows the training and testing loss

Table 5 Accuracy and loss of robust model on various epsilon of FGSM

The plot of accuracy and loss of the robust model on different epsilon is shown in Fig. 17. The model performed exceptionally well even on varying epsilon of FGSM attack, maintaining its accuracy of about 92% on adversarial data.

Fig. 17
figure 17

Illustrates the model’s performance in terms of accuracy and loss on different epsilons values of FGSM

The output of a robust model is shown in Fig. 18, where it correctly classifies the adversarial image of the masked face to its intended class, ‘with mask’. Note that the same adversarial image fooled the regular model into misclassifying it into an ‘incorrect mask’, as shown in Fig. 14.

Fig. 18
figure 18

Robust model correctly classifies an adversarial image of the masked face with 92.23% confidence

6 Conclusion

In this study, we demonstrated the vulnerabilities of the deep learning-based face mask detection model against adversarial examples and designed a framework for a more robust face mask detection system resistant to these attacks.

We began by introducing the custom dataset used to train the face mask identification model. The model was created by applying transfer learning of the state-of-the-art classification model MobileNetV2. The model attained a respectable accuracy of 95.83% when tested on clean inputs, as shown in Figs. 10 and 11. The model was subsequently subjected to adversarial examples computed on various epsilons of the FGSM attack. The results shown in Table 4 revealed that the models’ performance declined further as the epsilon’s value increased. On epsilon 0.009, the model’s accuracy declined to 14.53%, which is alarming. Although we employed extremely small epsilon values to create imperceptible perturbations, the model misclassified the adversarial input with greater certainty. In Fig. 15, we noted that the adversarial image of the masked face was recognized as an incorrect mask face with 100% confidence. During the FGSM attack, the majority of the images in the test data were wrongly classified as incorrectly masked faces.

We employed the adversarial learning strategy to enhance the robustness of the model, which involved modifying the custom dataset with adversarial samples. However, in our defense approach, we modified the conventional adversarial learning approach by using only a portion of the total dataset and generating adversarial examples on the combination of epsilon values. The results in Table 5 and Fig. 17 demonstrate that the model is robust, achieving 92% accuracy on adversarial examples.

Face mask identification is critical for effectively combatting COVID-19. Thus, further study on vulnerabilities and defensive techniques is required before these systems can be implemented. We believe our findings will assist researchers in improving the security of their models and raise awareness of the need to establish face mask detection models with several protection strategies.

7 Future scope

In the future, we would like to work on the following areas:

  • The present study is based on evaluating the model’s performance on adversarial images. There is a potential to carry out adversarial attacks on the face mask surveillance systems, forcing the model to misclassify from the live video data.

  • In our particular scenario, we attacked the model by utilizing the white-box FGSM attack because the specifics of the model, including its gradient, architecture, and training data, were readily available and easy to access. However, in practical cases, when the model is deployed, all of these specifics are often concealed from the users. However, in that case, there is still a scope to attack the model using Black-box strategies [7].

  • In our work, we used the FGSM strategy for adversarial example computation due to its simplicity. However, in the future, we would like to attack the model using other attack strategies such as the Basic Iterative Method (BIM) [34] and Jacobian-based Saliency Map Attack (JSMA) [47] .

  • As discussed, there are other strategies to generate adversarial perturbation. We aim to design a universal framework that is robust to the adversarial examples generated by any strategies.

  • To evaluate the model’s performance on adversarial images, we computed the perturbation of each clean image separately and added it to its corresponding clean image. However, there is a method by which we can compute the universal perturbation [43], which is the singular perturbation matrix for all test samples.

  • Due to limited resources, the model was retrained on a fraction of the training data and adversarial examples during adversarial learning. Although the robust model performed well on adversarial examples, there is room for improvement by retraining the model on the complete dataset and their adversarial examples.

  • In the future, we aim to investigate several different defensive strategies, such as network distillation, which extracts knowledge from deep neural networks to ensure their robustness [48], and adversarial example detection during the testing stage [40].