Transfer learning based cascaded deep learning network and mask recognition for COVID-19

Li, Fengyin; Wang, Xiaojiao; Sun, Yuhong; Li, Tao; Ge, Junrong

doi:10.1007/s11280-023-01149-z

Transfer learning based cascaded deep learning network and mask recognition for COVID-19

Published: 26 May 2023

Volume 26, pages 2931–2946, (2023)
Cite this article

Download PDF

World Wide Web Aims and scope Submit manuscript

Transfer learning based cascaded deep learning network and mask recognition for COVID-19

Download PDF

Fengyin Li¹,
Xiaojiao Wang¹,
Yuhong Sun¹,
Tao Li¹ &
…
Junrong Ge¹

1217 Accesses
2 Citations
1 Altmetric
Explore all metrics

Abstract

The COVID-19 is still spreading today, and it has caused great harm to human beings. The system at the entrance of public places such as shopping malls and stations should check whether pedestrians are wearing masks. However, pedestrians often pass the system inspection by wearing cotton masks, scarves, etc. Therefore, the detection system not only needs to check whether pedestrians are wearing masks, but also needs to detect the type of masks. Based on the lightweight network architecture MobilenetV3, this paper proposes a cascaded deep learning network based on transfer learning, and then designs a mask recognition system based on the cascaded deep learning network. By modifying the activation function of the MobilenetV3 output layer and the structure of the model, two MobilenetV3 networks suitable for cascading are obtained. By introducing transfer learning into the training process of two modified MobilenetV3 networks and a multi-task convolutional neural network, the ImagNet underlying parameters of the network models are obtained in advance, which reduces the computational load of the models. The cascaded deep learning network consists of a multi-task convolutional neural network cascaded with these two modified MobilenetV3 networks. A multi-task convolutional neural network is used to detect faces in images, and two modified MobilenetV3 networks are used as the backbone network to extract the features of masks. After comparing with the classification results of the modified MobilenetV3 neural network before cascading, the classification accuracy of the cascading learning network is improved by 7%, and the excellent performance of the cascading network can be seen.

Face mask detection and classification via deep transfer learning

Article 09 December 2021

Face Mask Recognition Based on MTCNN and MobileNet

COVID-19 Face Mask Detection Using CNN and Transfer Learning

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction

Since the outbreak, COVID-19 has spread to all continents around the world, becoming a global pandemic. While causing serious harm to people’s lives and health, COVID-19 has also had a major impact on the economy, society and politics [1]. The spread of COVID-19 is very broad. The virus is spread by aerosols or droplets formed when infected people talk, cough, or sneeze. When healthy people are in close contact with infected people, healthy people may be infected with the virus through direct contact with aerosols or droplets. Therefore, wearing a mask is a powerful and effective way to avoid infection and spread of COVID-19 [2]. The World Health Organization (WHO) calls on people to wear masks in workplaces, schools and shops without adequate ventilation, and areas where COVID-19 spreads should strictly abide by this guideline [3].

At present, the entrances of public places are equipped with cameras to detect whether pedestrians are wearing masks. However, the current detection system is not robust, as long as pedestrians cover their faces, they can pass the system detection, and pedestrians can pass through the gate. Many pedestrians are not aware of safety and do not wear masks when they go out. When entering large places, they enter relevant places through system detection by covering their faces with their hands or covering their faces with scarves. There are also frugal people who pass the detection system by wearing cotton masks that are less protective but can be used multiple times [4]. Such actions to deceive the detection system are called presentation attacks, and the biometrics or related instruments used in presentation attacks are called Presentation Attack Instruments (PAI). The ability of a detection system to identify PAI is called Presentation Attack Detection (PAD) [5]. Generally, the stronger the PAD of a system, the more types of PAIs the system can recognize. An ideal PAD system should be able to detect all known PAIs, as well as new unknown PAIs that appear in the future [6]. An ideal mask recognition system should be able to detect all obstructions.

Therefore, at the entrance of public places, the detection system should be able to distinguish between pedestrians wearing masks and other coverings. In this paper, a cascaded [7] network model for identifying mask types is proposed. This model can process the images detected by the camera to identify whether the face in the image wears a mask and the type of mask worn. Since the main purpose of the detection system is to allow pedestrians wearing masks to conduct relevant places, the main purpose of the cascade learning network constructed here is to improve the ability of the system to identify masks, not to improve the ability of the system to identify specific species of PAIs [8].

In this paper, a cascaded three-stage mask classification network is constructed by sequentially inputting images into a Multi-task convolutional neural network (MTCNN) [9] and two modified MobilenetV3 networks [10,11,12]. This cascaded network is used to classify images of wearing masks, wearing other coverings, and not wearing masks. The main contributions of the paper are listed below.

By modifying the lightweight network architecture MobilenetV3, updating the activation function of the MobilenetV3 output layer and the structure of the model, two modified MobilenetV3 networks are obtained.
Aiming at the problems of many parameters, long training time and large amount of computation in deep neural networks, transfer learning is applied to the training process of MTCNN and two modified MobilenetV3 networks respectively, and a deep neural network based on transfer learning is proposed. When the network model is trained, the underlying parameters use the transferred parameters, which reduces the training time of the model and the calculation amount of the model.
By cascading MTCNN with two modified MobilenetV3 networks, a cascaded deep learning network based on transfer learning is proposed. First use MTCNN to detect faces, excluding the influence of the background and clothing of pedestrians in the images on the subsequent classification of masks. Then use the first modified MobilenetV3 network for mask detection, and finally use the second modified MobilenetV3 network for mask recognition. After double screening by two modified MobilenetV3 networks, the final accuracy of the cascaded network in identifying qualified masks has been effectively improved.
Design a mask recognition system based on a cascaded deep learning network using a cascaded network. By recognizing face masks, the mask recognition system finds out pedestrians who are not wearing masks and other coverings, and ensures that pedestrians entering large places during the COVID-19 epidemic wear masks.

The rest of the paper is organized as follows. Section 2 introduces the related research on mask detection and recognition. Section 3 describes the construction and training of the cascaded network model. Section 4 uses a cascaded network for mask recognition. Section 5 analyzes the results of the experiments. Section 6 presents the conclusion.

2 Related work

Due to the easy spread of COVID-19, people must wear masks when traveling. It is even more necessary for pedestrians to wear masks when entering large crowded places. At present, research has been carried out on algorithms for pedestrians wearing masks.

Riyadh et al. [13] proposed a method to detect whether a face is wearing a mask. This method first uses OpenCV for face detection, and then uses the MobileNetV2 network for mask detection. This detection method is implemented using TensorFlow and OpenCV in the Jupyter Notebook simulation environment. When using OpenCV for face detection, use the confidence to filter the face frame. Confidence represents the probability that there is a face in the face frame. The higher the confidence level, the greater the probability that there is a face in the box, then the box is retained. The lower the confidence, the lower the probability of having a face in the box, and the box is discarded. In this way, the face can be detected more accurately. The MobileNetV2 network is used for mask detection, MobileNetV2 is a lightweight network that takes less time to run. However, this mask detection method has two drawbacks. One disadvantage is that this method requires the mask to properly cover the mouth and nose area, otherwise the algorithm will have a lower probability of detecting the mask. Another disadvantage is that the author only implements the algorithm to detect whether pedestrians wear masks. If the pedestrian uses other objects to cover the face, the detection system will think that the pedestrian is wearing a mask.

Puja Gupta et al. [14] proposed a ResNet-based multi-pose face representation attack detection method. This method can detect whether the pedestrian is wearing a mask even if the pedestrian’s whole body is within the range of the camera and the posture is different. The method utilizes the Extended Mask R-CNN algorithm (Ex-Mask R-CNN) to detect individuals wearing masks, and uses RES-NET-152 to extract facial features from input images. Although this method achieves good results in the CASIASURF database, it does not solve the difficult problem of unknown attack detection.

Su et al. [15] proposed a new algorithm for mask detection and classification that integrates transfer learning and deep learning. This algorithm combines transfer learning and an Efficient-Yolov3 neural network for mask detection. In the feature extraction stage, EfficientNet is used as the backbone feature extraction network. The author uses EfficientNet as the feature extractor and MobileNet as the classifier, combining the two neural networks to take full advantage of the advantages of each technique. The generalization ability of the model is enhanced. The problem with this algorithm is that it does not address the detection of unknown PAIs.

As can be seen from the above discussion, many methods for masks have been proposed, but none of them address the problem of unknown PAI [16, 17]. The cascaded neural network proposed in this paper focuses detection on qualified masks. As long as all pedestrians wearing qualified masks can be identified, it doesn’t really matter what coverings the remaining unqualified pedestrians are wearing, so even if a new unknown PAI appears, it will not affect the judgment of the cascade system.

3 Cascaded deep learning network based on transfer learning

In this section, we construct and train the MTCNN, MobilenetV3a, and MobilenetV3b networks in the cascaded deep learning network model. Section 3.1 presents the construction process of the cascaded network. Section 3.2 constructs three sub-networks, and obtains MobilenetV3a and MobilnetV3b by modifying MobilenetV3. Section 3.3 uses transfer learning [18] to train the constructed network model, so as to obtain the trained parameters.

3.1 Cascade network

The cascaded deep learning network consists of three sub-networks: MTCNN, MobilenetV3a and MobilenetV3b. When building a cascaded deep learning network, the first step is to build a model of the sub-network. The activation function of the MobilenetV3 output layer and the structure of the model are modified to obtain the MobilenetV3a and MobilenetV3b networks. When performing mask identification, it is necessary to first cut out the face of the person wearing the mask, so as to remove the influence of interference factors such as background and clothes. So use MTCNN for face detection. MobilenetV3a is used to detect masks, and MobilenetV3b is used to identify masks. After the network model is constructed, transfer learning is used to train MTCNN, MobilenetV3a and MobilenetV3b respectively [19]. The underlying parameters of the sub-network use the parameters of ImagNet, which reduces the training time. After the three network models are constructed and trained, the models are cascaded to obtain a cascaded deep learning network based on transfer learning [20, 21].

3.2 Construction of cascaded network model

In this section, each sub-network is first analyzed, and the specific functions that each network needs to implement are clarified. The process and details of each sub-network model construction are given in detail.

In the face detection stage, an MTCNN model is constructed. It is a multi-task neural network model for face detection proposed by the Shenzhen Research Institute of the Chinese Academy of Sciences in 2016. MTCNN consists of three cascaded networks, which are the P-Net network for generating face candidate boxes, the R-Net network for filtering and selecting high-precision candidate boxes, and the O-Net network for generating the final bounding box. MTCNN can exclude the influence of lighting, pose, and occlusion conditions, and perform face detection in an unconstrained environment. The MTCNN network is used here for face detection and returns the captured face image. If no face is detected, the program aborts this run.

In the mask detection stage, a MobilenetV3a network modified based on MobilenetV3 is constructed. The function implemented by MobilenetV3a is to divide the input image into two categories: wearing masks and not wearing masks. When building the MobilenetV3a network, the features of the input image are extracted using convolutional layers. There are multiple convolution kernels in the convolution layer, and different convolution kernels can be used to extract different features, which is convenient for classifying by features. Because the data is at risk of overfitting, batch normalization is performed on the data in MobilenetV3a. Since MobilenetV3a needs to divide the input image into two categories, the activation function of the output layer of the MobilenetV3 network is modified to the sigmod function. The sigmod function maps the output value of the network to between 0 and 1 to achieve binary classification. In this way, the MobilenetV3a network is constructed.

In the mask recognition stage, a MobilenetV3b network modified based on MobilenetV3 is constructed. MobilenetV3b classifies input images into qualified images and unqualified images, where qualified images include images with qualified masks, and unqualified images include images without masks and images using other coverings (veils, cotton masks, scarves, etc.). MobilenetV3b uses dropout to process the input after the pooling layer. For neural network units, they are discarded from the network with a certain probability to prevent the network from overfitting. Since MobilenetV3b performs multi-classification, the activation function of its output layer is softmax. In this way, MobilnetV3b is also built.

After the three models are constructed, the next step is to train them separately.

3.3 Training of the network model

During training, the training of the three sub-network models is performed independently and without association.

Transfer learning is applied during MTCNN model training, and the MTCNN parameters in the ImagNet dataset are used to reduce model training time. Transfer learning refers to transferring the parameters of the trained model to the new model to help train the new model.The new model uses the trained parameters and does not retrain. This behavior is called freezing parameters. Due to the large amount of parameters and the large amount of data during model training, the training takes a long time. Using transfer learning during network model training can reduce the parameters of model training and reduce the time spent on model training [22].

The images in the dataset have already been cropped with faces. When training MobilenetV3a and MobilenetV3b models, it is no longer necessary to use MTCNN for preprocessing, which greatly improves the efficiency of model training. The training process of MobilenetV3a and MobilenetV3b has the same idea. The training process is divided into the following 5 steps:

(1)
The data set must be divided first when the model is trained. Randomly shuffle the input images of the dataset to enhance the generalization ability of the model [23]. Then randomly select 90% of the data as the training set and 10% of the data as the test set. The data set is randomly divided and the data is randomly distributed, which is beneficial to enhance the robustness of the model [24].
(2)
Transfer learning is applied during model training to reduce training time. MobilenetV3a and MobilenetV3b use the first 80 layer parameters of MobilenetV3 in ImageNet, and only train network parameters after 80 layers during training. The process of network model using transfer learning is shown in Figure 1, using MobilenetV3a as an example. In Figure 1, the first layer of MobilenetV3 represents the input layer, the middle layers represent the hidden layer, and the last layer represents the output layer. In MobilenetV3a, the first and last layers represent the input and output layers, respectively. The maiddle layers are all hidden layers. The part inside the dashed box indicates that MobilenetV3a uses the parameters of the corresponding layers of MobilenetV3.
(3)
Set the conditions for early stopping of model training. During model training, if the loss of the training set has reached the minimum, the training can be stopped in advance without reaching the specified training round. Determining that the loss of the training set is minimized is achieved by the following operations. When the model is trained, the loss of the training set is monitored. When 3 epochs have passed and the loss of the training set has not changed, it means that the performance of the model has not improved. At this time, the action of reducing the learning rate is triggered. The purpose of reducing the training set loss is achieved by reducing the learning rate. Even so, after the operation of reducing the learning rate, when 10 epochs have passed and the loss of the training set has not changed, it means that the loss of the training set is no longer reduced, and the training will be stopped in advance.
(4)
If the model reaches the specified training round, but the loss of the training set does not reach the minimum, it needs to unfreeze training. Specifying the parameters after 80 layers of training is called simple training. The epochs for simple training of the model are set to 50. If the training is not stopped early in 50 epochs, it means that only training parameters after 80 layers cannot minimize the loss of the model. At this time, the first 80 layers of the model are unfrozen and trained in more detail. Stop training until the training is completed or the conditions are met.
(5)
During training, the parameters are saved every 2 rounds, and the obtained parameters are saved to the corresponding path.

The MobilenetV3b model initially uses the weights of ImagNet, freezing the first 80 layers of the network. However, when thawing training is performed, it is found that the effect of thawing training is better. The final MobilenetV3 model is trained from scratch based on three classes of images with masks, without masks, and with other occlusions. In this way, the parameters of the trained MobilenetV3a and MobilenetV3b models are obtained.

In this way, the construction and training of the three sub-network models are all completed, and the three networks can be cascaded in the next step. First, MTCNN is used to detect faces in the input images, and the faces are cut out from the images. Then input the face images into MobilenetV3a for mask detection. If the people in the image are wearing masks, then feed the image into MobilenetV3b to classify the masks. The construction, training and testing process of the cascaded network is shown in Figure 2.

4 Mask recognition based on cascaded deep learning network

After completing the construction and training of the sub-network model, the cascade network can be used for mask recognition. In this section, we use a cascaded deep learning network to detect whether pedestrians wear qualified masks. In Section 4.1, the datasets used, the types, numbers, and sizes of images in the datasets are introduced, and different types of images are shown. The dataset is preprocessed in Section 4.2. In Section 4.3, the process of mask recognition by cascaded neural network is clarified. Mask recognition using a cascaded network in Section 4.4.

4.1 Datasets

The first dataset is the face mask dataset. The size of the images in the dataset is 16 \(\times \) 16, and the categories of the images can be divided into two categories: faces with masks and faces without masks. An image with a mask is called a mask, and an image without a mask is called a NO-mask. Among them, the number of images with faces wearing masks is 313, the number of images without masks is 443, and the dataset has a total of 756 images. We call this dataset “Face-Mask1” here.

The second dataset is the mask classification dataset. The images of the dataset are 224 \(\times \) 224 in size, and the categories of masks in the images can be divided into qualified masks and unqualified masks. An Qualified mask is called OK-mask. Qualified masks include N95 masks and disposable surgical masks, and the number is 1361. An Unqualified mask is called NG-mask. Unqualified masks mainly include sponge masks, cloth masks and scarves, etc. There are 1880 masks. This mask dataset contains a total of 3241 images for mask classification. We call this dataset “Face-Mask” here. Pictures of different categories in the two datasets are shown in Figure 3.

4.2 Preprocessing

Before the images are passed into the MobilenetV3a network for classification, they enter a preprocessing stage, which includes face detection, face alignment, and normalization. First, the MTCNN algorithm is used to detect the faces, and the landmarks of the key points are marked, so that the landmarks can be used to cut out the faces from the images. In the process of using the MTCNN algorithm, the image pyramid is used to solve the problem of different sizes of faces in the images. Then use the coordinates of the eyes to perform an affine transformation to align the faces [25]. After these steps, the face images are aligned. In addition, the face image is normalized, so that the eigenvalues of the image are adjusted to a similar range, which is convenient for subsequent processing [26].

4.3 Mask recognition process

After the data preprocessing is completed, it can be passed to the cascade network for classification. The input images are first passed into MTCNN, and MTCNN detects the faces and intercepts the faces in the images. Then use the MobilenetV3a network to classify the face images. The images are divided into two categories: wearing masks and not wearing masks. Images without masks are judged to be unqualified, and images wearing masks are passed to MobilenetV3b. MobilenetV3b classifies incoming images in detail into images with qualified masks, images with other occlusions, and images without masks. Images wearing qualified masks were judged as qualified, and the other two types of images were judged as unqualified. The process of mask recognition using cascade network is shown in Figure 4.

4.4 Mask recognition details

After determining the process of identifying masks by the cascade network, the next step is to create models, read parameters, and identify masks.

First create an MTCNN model for face interception in images. Specify the eligibility criteria for the Pnet, Rnet, and Onet networks in MTCNN, and use the weights in the ImagNet dataset. Next, create a MobilenetV3a model for identifying whether the face is wearing a mask. The labels of the specified images are NO-mask and mask, and the number of labels is 2, so MobilenetV3a divides the images into 2 categories. Download the weights of MobilenetV3a that have been trained during the training phase. Finally, a MobilenetV3b model is created to identify whether the mask is qualified or not. The labels of the specified images are OK-mask, NG-mask and NO-mask, and the number of labels is 3, so MobileNetV3b divides the images into 3 categories. Download the MobileNetV3b weights that have been trained in the training phase.

After the model is created, the identification of the mask begins. First, the images are passed into MTCNN to intercept the faces. If MTCNN does not detect a face, the program aborts. Otherwise, MTCNN will return the coordinates of the face frame and the coordinates of the face key points. If there is more than one face in an image, then MTCNN will return more than one face box. The face frames returned by MTCNN may be rectangles. Since MobilenetV3a requires the input images to be squares, the face frames captured by MTCNN are processed, and the frames are changed into squares without losing frames. The face frame may contain a part outside the image, and the frame is processed so that the frame does not exceed the scope of the image. In this way, the processed face frame coordinates and face key point coordinates in the images are obtained.

Then, the face frames are intercepted on the original image, and the intercepted images is the face images. The faces may be crooked, and straightening the faces will help improve the recognition effect of the model [27]. The key points of the faces include the eyes, and the coordinates of the eyes are used to straighten the faces using the radiation transformation. Normalize the straightened face images, and adjust the values of the images to a similar range, which is convenient for later data processing. The normalized images are passed into MobilenetV3a, and MobilenetV3a classifies the incoming images. Images are classified as those with masks and those without masks.

Finally, the images with masks are passed into MobilenetV3b. MobilenetV3b classifies the images, and the images are classified into images with qualified masks, images with other occlusions, and images without masks. Pedestrians corresponding to the images wearing qualified masks can pass through the gates, and the other two categories are judged to be unqualified and will not be released.

5 Experimental results

To compare the classification performance of a single neural network model with a cascaded neural network model, we conduct three experiments in this subsection. The first experiment and the second experiment respectively test MobilenetV3a and MobilenetV3b, and observe the experimental results. In experiment 3, the images processed by MTCNN are passed into the cascade network for classification, and the experimental results are observed. MobilenetV3b in Experiment 2 is used as a single neural network model, and its experimental results are compared with the results of the cascaded network in Experiment 3.

Table 1 Experimental details

Full size table

5.1 Experiment introduction

Experiment 1 uses a modified MobilenetV3 network, which is called MobilenetV3a, by using transfer learning for training. Freeze different layers of the network for testing, and finally decide to freeze the first 80 layers of the network. In this experiment, images were divided into two categories, images with masks and images without masks. Mask-wearing images include images wearing acceptable masks and images using other occlusions.

Experiment 2 trains a modified MobilenetV3 network from scratch called MobilenetV3b. In this experiment, images were divided into three categories: images with qualified masks, images with other occlusions, and images without masks.

Experiment 3 cascades MobilenetV3a and MobilenetV3b for performance evaluation. In this experiment, images are divided into three categories: images with qualified masks, images with other occlusions, and images without masks.

Table 1 lists the details of the models used, the pooling layers of the model, the activation function, and the number of frozen layers in Experiment 1, Experiment 2, and Experiment 3. The three experiments use different neural networks respectively, and the network with the best performance is obtained by comparing the experimental results. The experimental results are analyzed in 5.2.

5.2 Result analysis

In this section, the results of each experiment are presented and analyzed.

Figure 5 shows the results of Experiment 1, the confusion matrix for the classification results of the fine-tuned trained MobilenetV3a. The confusion matrix takes into account the classes of wearing masks and not wearing masks. A confusion matrix is an analytical summary of the prediction results of a classification problem. In Figure 5, the 1 in the upper left corner of the confusion matrix represents the proportion of the Mask class that is correctly classified. The 0 in the upper right corner represents the proportion of the Mask class that was misclassified as the NO-mask class. The 0 in the lower left corner represents the proportion of the NO-mask class that was misclassified as the Mask class. The 1 in the lower right corner represents the proportion of NO-mask correctly classified. The line on the right of the confusion matrix indicates that different values correspond to different colors, and the colors from small to large correspond to light to dark. In the confusion matrix, the larger the value, the darker the background color of the box where the value is. In Figure 5, the box where the value 1 is located is obviously much darker than the box where the value 0 is located. Since the difference between wearing a mask and not wearing a mask is very obvious, in experiment 1, the accuracy rate of MobilenetV3a judging wearing a mask is 1.

Figure 6 shows the results of Experiment 2, the confusion matrix for the classification results of MobilenetV3b trained from scratch. The confusion matrix takes into account the three categories of wearing eligible masks, using other items to cover the faces, and not wearing masks. The 0.85 in the upper left corner means that 85% of OK-masks are correctly classified. We first test the classification results of MobilenetV3b alone in Experiment 2, and then compare with the classification results of Experiment 3 in Experiment 3. In this experiment, due to the addition of NG-mask images, the features of NG-mask and OK-mask are similar, which interferes with the classification of the model, so the accuracy of the model for OK-mask is significantly reduced.

Figure 7 shows the confusion matrix of the classification results of the cascaded neural network in Experiment 3. The confusion matrix considers the classes of wearing qualified masks, wearing other items, and not wearing masks. A 0.92 in the upper left corner means that 92% of OK-masks are correctly classified. Compared with MobilenetV3b, the cascaded network improves the accuracy of OK-mask by 7%. It fully shows that compared with a single model, the cascade network can effectively improve the accuracy of model classification.

Figure 8 is the result of mask recognition for pedestrians by MobilenetV3b, and Figure 9 is the result of mask recognition by cascaded neural networks. MobilenetV3b and cascaded neural network are used for mask recognition on the same image.The picture contains three cases of wearing masks, namely OK-mask, NO-mask and NG-mask. It can be seen intuitively from the figure that the pedestrian with the schoolbag in Figure 8 does not wear a mask, but MobilenetV3b thinks that the pedestrian wears other coverings. In Figure 9, the cascaded neural network identified correctly that the pedestrian did not wear a mask. This is due to the addition of the MobilenetV3a network to the cascade network, which filters out pedestrians without masks first to avoid confusion between pedestrians without masks and those wearing other masks during subsequent classification.As can be seen intuitively from the two figures above, the cascaded neural network is more effective in classification than a single model.

6 Conclusion

In this paper, we propose a cascaded convolutional neural network based on MobilenetV3 to identify whether a pedestrian is wearing a mask and whether the pedestrian is wearing a qualified mask. We combined the images from the Face-Mask1 and Face-Mask2 databases to increase the types and number of PAIs, making the trained model more robust. In the end, we conducted three experiments. By comparing the experimental results of a single MobilenetV3b neural network and a cascaded neural network, it shows that the cascaded neural network is very effective for mask classification.

In the era of epidemic, it is very dangerous to take off the mask for face recognition. In the future work, we will further study the face recognition when wearing a mask, and improve the safety protection of face recognition in the epidemic environment.

Data Availability

Data openly available in a public repository. The data that support the findings of this study are openly available at https://pan.baidu.com/s/1LG9CUt4X0knKbvh_vDe7RA?pwd=1111.

References

Wang, Y., Lv, Z., Sheng, Z., Sun, H., Zhao, A.: A deep spatio-temporal meta-learning model for urban traffic revitalization index prediction in the covid-19 pandemic. Adv. Eng. Inform. 53, 101678 (2022). https://doi.org/10.1016/j.aei.2022.101678
Article Google Scholar
Wang, Y., Zhang, Y., Zhang, X., Liang, H., Li, G., Wang, X.: An intelligent forecast for covid-19 based on single and multiple features. Int. J. Intell. Syst. n/a(n/a) https://onlinelibrary.wiley.com/doi/pdf/10.1002/int.22995. https://doi.org/10.1002/int.22995
Mutlag, A.H., Mahdi, S.Q., Gharghan, S.K., Salim, O.N.M., Al-Naji, A., Chahl, J.: Improved control system based on pso and ann for social distancing for patients with covid-19. IEEE Access 10, 63797–63811 (2022). https://doi.org/10.1109/ACCESS.2022.3183124
Article Google Scholar
Yan, H., Hu, L., Xiang, X., Liu, Z., Yuan, X.: Ppcl: Privacy-preserving collaborative learning for mitigating indirect information leakage. Inform. Sci. 548, 423–437 (2021). https://doi.org/10.1016/j.ins.2020.09.064
Article MathSciNet Google Scholar
George, A., Marcel, S.: Learning one class representations for face presentation attack detection using multi-channel convolutional neural networks. IEEE Trans. Inform. Forensics Secur. 16, 361–375 (2021). https://doi.org/10.1109/TIFS.2020.3013214
Article Google Scholar
Ren, H., Huang, T., Yan, H.: Adversarial examples: attacks and defenses in the physical world. Int. J. Mach. Learn. Cybern. 12 (2021). https://doi.org/10.1007/s13042-020-01242-z
Tapia, J.E., Gonzalez, S., Busch, C.: Iris liveness detection using a cascade of dedicated deep learning networks. IEEE Trans. Inform. Forensics Secur. 17, 42–52 (2022). https://doi.org/10.1109/TIFS.2021.3132582
Article Google Scholar
Chen, C., Huang, T.: Camdar-adv: Generating adversarial patches on 3d object. Int. J. Intell. Syst. 36(3), 1441–1453 (2021) https://onlinelibrary.wiley.com/doi/pdf/10.1002/int.22349. https://doi.org/10.1002/int.22349
Zhang, K., Zhang, Z., Li, Z., Qiao, Y.: Joint face detection and alignment using multitask cascaded convolutional networks. IEEE Signal Process. Lett. 23(10), 1499–1503 (2016). https://doi.org/10.1109/LSP.2016.2603342
Article Google Scholar
Howard, A., Sandler, M., Chen, B., Wang, W., Chen, L.-C., Tan, M., Chu, G., Vasudevan, V., Zhu, Y., Pang, R., Adam, H., Le, Q.: Searching for mobilenetv3. In: 2019 IEEE/CVF International Conference on Computer Vision (ICCV), pp. 1314–1324 (2019). https://doi.org/10.1109/ICCV.2019.00140
Kuang, X., Zhang, M., Li, H., Zhao, G., Cao, H., Wu, Z., Wang, X.: DeepWAF: Detecting Web Attacks Based on CNN and LSTM Models, pp. 121–136 (2020). https://doi.org/10.1007/978-3-030-37352-8_11
Ai, S., Hong, S., Zheng, X., Wang, Y., Liu, X.: Csrt rumor spreading model based on complex network. Int. J. Intell. Syst. 36(5), 1903–1913 (2021). https://onlinelibrary.wiley.com/doi/pdf/10.1002/int.22365. https://doi.org/10.1002/int.22365
Rahmani, M.K.I., Taranum, F., Nikhat, R., Farooqi, M.R., Khan, M.A.: Automatic real-time medical mask detection using deep learning to fight covid-19. Comput. Syst. Sci. Eng. 42, 1181–1198 (2022). https://doi.org/10.32604/csse.2022.022014
Gupta, P., Sharma, V., Varma, S.: A novel algorithm for mask detection and recognizing actions of human. Expert Syst. Applic. 198, 116823 (2022). https://doi.org/10.1016/j.eswa.2022.116823
Article Google Scholar
Su, X., Gao, M., Ren, J., Li, Y., Dong, M., Liu, X.: Face mask detection and classification via deep transfer learning. Multimed Tools Appl 81, 4475–4494 (2022). https://doi.org/10.1007/s11042-021-11772-5
Article Google Scholar
Wang, Y., Li, T., Liu, M., Li, C., Wang, H.: Stsiiml: Study on token shuffling under incomplete information based on machine learning. Int. J. Intell. Syst. n/a(n/a) https://onlinelibrary.wiley.com/doi/pdf/10.1002/int.23033. https://doi.org/10.1002/int.23033
Lu, Z., Liang, H., Zhao, M., Lv, Q., Liang, T., Wang, Y.: Label-only membership inference attacks on machine unlearning without dependence of posteriors. Int. J. Intell. Syst. n/a(n/a) https://onlinelibrary.wiley.com/doi/pdf/10.1002/int.23000. https://doi.org/10.1002/int.23000
Tamayo-Monsalve, M.A., Mercado-Ruiz, E., Villa-Pulgarin, J.P., Bravo-Ortiz, M.A., Arteaga-Arteaga, H.B., Mora-Rubio, A., Alzate-Grisales, J.A., Arias-Garzon, D., Romero-Cano, V., Orozco-Arias, S., Gustavo-Osorio, G., Tabares-Soto, R.: Coffee maturity classification using convolutional neural networks and transfer learning. IEEE Access 10, 42971–42982 (2022). https://doi.org/10.1109/ACCESS.2022.3166515
Hu, L., Yan, H., Li, L., Pan, Z., Liu, X., Zhang, Z.: Mhat: An efficient model-heterogenous aggregation training scheme for federated learning. Inform. Sci. 560, 493–503 (2021). https://doi.org/10.1016/j.ins.2021.01.046
Article MathSciNet Google Scholar
Jiang, N., Jie, W., Li, J., Liu, X., Jin, D.: Gatrust: A multi-aspect graph attention network model for trust assessment in osns. IEEE Trans. Knowl. Data Eng., 1–1 (2022). https://doi.org/10.1109/TKDE.2022.3174044
Tianqing, Z., Zhou, W., Ye, D., Cheng, Z., Li, J.: Resource allocation in iot edge computing via concurrent federated reinforcement learning. IEEE Intern. Things J. 9(2), 1414–1426 (2022). https://doi.org/10.1109/JIOT.2021.3086910
Article Google Scholar
Zhao, A., Li, J.: Two-channel lstm for severity rating of parkinson’s disease using 3d trajectory of hand motion. Multimedia Tools and Applications (2022). https://doi.org/10.1007/s11042-022-12659-9
Article Google Scholar
Mo, K., Tang, W., Li, J., Yuan, X.: Attacking deep reinforcement learning with decoupled adversarial policy. IEEE Trans. Dependable Secure Comput. 1–1 (2022). https://doi.org/10.1109/TDSC.2022.3143566
Zhu, T., Li, J., Hu, X., Xiong, P., Zhou, W.: The dynamic privacy-preserving mechanisms for online dynamic social networks. IEEE Trans. Knowl. Data Eng. 34(6), 2962–2974 (2022). https://doi.org/10.1109/TKDE.2020.3015835
Article Google Scholar
Li, X., Xu, Y., Lv, Q., Dou, Y.: Affine-transformation parameters regression for face alignment. IEEE Signal Process. Lett. 23(1), 55–59 (2016). https://doi.org/10.1109/LSP.2015.2499778
Article Google Scholar
Su, Q., Zhang, X., Wang, H.: A blind color image watermarking algorithm combined spatial domain and svd. Int. J. Intell. Syst. (2021). https://doi.org/10.1002/int.22738
Article Google Scholar
Han, T., Zhang, L., Jia, S.: Bin similarity-based domain adaptation for fine-grained image classification. Int. J. Intell. Syst. (2021). https://doi.org/10.1002/int.22775
Article Google Scholar

Download references

Acknowledgements

Thanks to all authors for their efforts.

Funding

This study is supported by the Foundation of National Natural Science Foundation of China (Grant No.: 62072273, 72111530206, 61962009, 61873117, 61832012, 61771231, 61771289); The Major Basic Research Project of Natural Science Foundation of Shandong Province of China (ZR2019ZD10); Natural Science Foundation of Shandong Province (ZR2019MF062); Shandong University Science and Technology Program Project (J18A326); Guangxi Key Laboratory of Cryptography and Information Security (No: GCIS202112); The Major Basic Research Project of Natural Science Foundation of Shandong Province of China (ZR2018ZC0438); Major Scientific and Technological Special Project of Guizhou Province (20183001), Foundation of Guizhou Provincial Key Laboratory of Public Big Data (No. 2019BD-KFJJ009), Talent project of Guizhou Big Data Academy. Guizhou Provincial Key Laboratory of Public Big Data. ([2018]01).

Author information

Authors and Affiliations

School of Computer Science, Qufu Normal University, Rizhao City, Shandong Province, China
Fengyin Li, Xiaojiao Wang, Yuhong Sun, Tao Li & Junrong Ge

Authors

Fengyin Li
View author publications
You can also search for this author in PubMed Google Scholar
Xiaojiao Wang
View author publications
You can also search for this author in PubMed Google Scholar
Yuhong Sun
View author publications
You can also search for this author in PubMed Google Scholar
Tao Li
View author publications
You can also search for this author in PubMed Google Scholar
Junrong Ge
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

All authors contributed to the study conception and design. Fengyin Li put forward the main ideas, Xiaojiao Wang wrote the main manuscript text, Yuhong Sun wrote the main experimental code, Tao Li revised the manuscript text, and Junrong Ge searched for the required literature. All authors reviewed the manuscript.

Corresponding author

Correspondence to Tao Li.

Ethics declarations

Ethical Approval and Consent to Participate

The authors guarantee that this manuscript is an original work. This manuscript has not been published or presented elsewhere in part or in entirety and is not under consideration by another journal. We have read and understood your journal’s policies, and we believe that neither the manuscript nor the study violates any of these. All authors have seen and approved the final version of the submitted manuscript.

Human and Animal Ethics

The authors declare that this study does not involve human participants or animals.

Consent for Publication

All authors have checked the manuscript and have agreed to the submission.

Competing Interests

The authors have no competing interests to declare that are relevant to the content of this manuscript.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

This article belongs to the Topical Collection: Special Issue on Privacy and Security in Machine Learning Guest Editors: Jin Li, Francesco Palmieri and Changyu Dong.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Cite this article

Li, F., Wang, X., Sun, Y. et al. Transfer learning based cascaded deep learning network and mask recognition for COVID-19. World Wide Web 26, 2931–2946 (2023). https://doi.org/10.1007/s11280-023-01149-z

Download citation

Received: 04 August 2022
Revised: 08 November 2022
Accepted: 01 February 2023
Published: 26 May 2023
Issue Date: September 2023
DOI: https://doi.org/10.1007/s11280-023-01149-z

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Transfer learning based cascaded deep learning network and mask recognition for COVID-19

Abstract

Similar content being viewed by others

Face mask detection and classification via deep transfer learning

Face Mask Recognition Based on MTCNN and MobileNet

COVID-19 Face Mask Detection Using CNN and Transfer Learning

1 Introduction

2 Related work

3 Cascaded deep learning network based on transfer learning

3.1 Cascade network

3.2 Construction of cascaded network model

3.3 Training of the network model

4 Mask recognition based on cascaded deep learning network

4.1 Datasets

4.2 Preprocessing

4.3 Mask recognition process

4.4 Mask recognition details

5 Experimental results

5.1 Experiment introduction

5.2 Result analysis

6 Conclusion

Data Availability

References

Acknowledgements

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Ethical Approval and Consent to Participate

Human and Animal Ethics

Consent for Publication

Competing Interests

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation