EfficientMask-Net for face authentication in the era of COVID-19 pandemic

Azouji, Neda; Sami, Ashkan; Taheri, Mohammad

doi:10.1007/s11760-022-02160-z

EfficientMask-Net for face authentication in the era of COVID-19 pandemic

Original Paper
Published: 21 April 2022

Volume 16, pages 1991–1999, (2022)
Cite this article

Download PDF

Signal, Image and Video Processing Aims and scope Submit manuscript

EfficientMask-Net for face authentication in the era of COVID-19 pandemic

Download PDF

1144 Accesses
3 Citations
1 Altmetric
Explore all metrics

Abstract

Today, we are facing the COVID-19 pandemic. Accordingly, properly wearing face masks has become vital as an effective way to prevent the rapid spread of COVID-19. This research develops an Efficient Mask-Net method for low-power devices, such as mobile and embedding models with low-memory requirements. The method identifies face mask-wearing conditions in two different schemes: I. Correctly Face Mask (CFM), Incorrectly Face Mask (IFM), and Not Face Mask (NFM) wearing; II. Uncovered Chin IFM, Uncovered Nose IFM, and Uncovered Nose and Mouth IFM. The proposed method can also be helpful to unmask the face for face authentication based on unconstrained 2D facial images in the wild. In this study, deep convolutional neural networks (CNNs) were employed as feature extractors. Then, deep features were fed to a recently proposed large margin piecewise linear (LMPL) classifier. In the experimental study, lightweight and very powerful mobile implementation of CNN models were evaluated, where the novel “EffientNetb0” deep feature extractor with LMPL classifier outperformed well-known end-to-end CNN models, as well as conventional image classification methods. It achieved high accuracies of 99.53 and 99.64% in fulfilling the two mentioned tasks, respectively.

Untargeted white-box adversarial attack to break into deep learning based COVID-19 monitoring face mask detection system

Article 05 May 2023

Performance Evaluation of CNN Models for Face Detection and Recognition with Mask

An improved deep transfer learning approach to identify the human face mask in real-time considering the COVID-19 pandemic

Article 31 July 2023

1 Introduction and motivation

It is necessary to design a model for automatic identification of face mask-wearing conditions and use it as a first step to unmask the faces for face authentication in mobile devices and security systems, such as ATMs, banks, airport security checkpoints, and facial-biometric attendance systems.

The face mask condition identification is a very challenging task because while samples from the different classes are highly similar, samples from the same class may be much different. In other words, there are a great intra-class variation and a small inter-class variation, which make it difficult to learn discriminant features. Figure 1 depicts some samples from the three classes.

In this paper, a new method has been developed for face mask-wearing identification using well-known deep convolutional neural networks (CNNs) as feature extractors and a novel large margin piecewise linear (LMPL) [1] as a classifier.

The proposed method contains four main steps: image preprocessing, deep feature extraction, face mask- wearing classification, and face unmasking. The proposed method showed an excellent performance in a computational resource-limited environment, for both classification tasks with 99.53 and 99.64% accuracy, respectively. Moreover, unmasking the masked faces showed a promising result. It can be concluded that the proposed EfficientMask-Net method is effective in face mask-wearing identification, as well as face unmasking. Therefore, it can be used in many security systems for epidemic prevention and face authentication.

2 Related work

2.1 Masked face detection

Prasad et al. [2] proposed a lightweight model called “MaskedFaceNet” for real-time mask detection using a progressive semi-supervised approach. Fasfous et al. [3] presented BinaryCoP (Binary COVID-mask Predictor) to detect correct face mask-wearing and positioning. The proposed BinaryCoP was a low-power binary neural network (BNN) classifier, which performed the classification on edge devices, such as embedded FPGA accelerator. They used the MaskedFaceNet dataset with four classes, including IMFD Nose and Mouth, IMFD Nose, IMFD Chin, and CMFD, and balanced the dataset with data augmentation techniques. As a result, accuracy of up to 98% was obtained for the wearing positioning problem.

2.2 Mobile-based face mask detection

Cabani et al. [4] introduced the MaskedFace-Net dataset with 137,016 images. This large-scale dataset includes Correctly Masked Face Dataset (CMFD) and Incorrectly Masked Face Dataset (IMFD), in which masked faces are created by applying a deformable model on the Flickr-Faces-HQ3 (FFHQ) face dataset. Qin et al. [5] proposed image super-resolution and classification network (SRCNet), in which a super-resolution method was applied to improve the performance of low-quality images. They classified the face mask-wearing situations into three classes, including correct mask-wearing, incorrect mask-wearing, and no mask-wearing, and achieved an accuracy of 98.70%. The training and evaluation were performed on the public Medical Masks Dataset containing 3835 images.

2.3 Identification of face mask-wearing conditions

Dey et al. [6] proposed a deep learning and multi-stage face mask detection method called “Mobile-Net Mask.” They used two different datasets with 5200 images to detect Masked or NotMasked faces from still images and video streams. The Mobile-Net Mask reached an accuracy of 93%. Jiang et al. [7] presented a RetinaFaceMask detector based on the one-stage RetinaNet for high-accuracy face mask detection. The introduced model contained ResNet or MobileNet as a backbone, along with a feature pyramid network (FPN) and context attention modules. The authors achieved a 93.4% precision, which was higher than baseline results.

As presented in this section, although researchers have introduced several approaches to identify face mask-wearing conditions, face authentication lacks a unified system. In this study, we developed a unified, efficient method for face mask-wearing identification besides unmasking the masked faces, which can be useful in authentication systems.

3 Materials and methods

This section describes the overall process of the proposed EfficientMask-Net method. Figure 2 demonstrates the diagram of the proposed mask-wearing system.

3.1 Image preprocessing

Image preprocessing enhances the visual appearances of images and results in higher accuracy of the detection system.

3.2 Resizing face images

The input images of EfficientNet were resized to $224 \times 224 \times 3$ using bicubic interpolation.

3.3 Image adjustment

Real-world images have a considerable variation in contrast and exposure. The images were adjusted by mapping input intensity to the new values to saturate 1% of the pixel values in low and high intensities. Besides, the histogram of images was calculated to determine the adjustment limit automatically.

3.4 Deep feature extraction

High-level and abstract features can be extracted by deep CNNs. This study focused on a small and efficient network in computational power. Transfer learning was used to prevent overfitting and obtain better generalization.

EfficeintNet was introduced by Tan and Le [8] in 2019. It is one of the most efficient CNN models among well-known pre-trained networks with a small number of FLOPS. Compared to other models achieving similar ImageNet accuracy, the EfficientNet is much smaller and faster. The authors have shown that the proposed EfficientNet is five times faster for inference on mobile devices [8].

3.5 Large margin piecewise linear (LMPL) classifier

The novel large margin piecewise linear (LMPL) classifier [1] works based on a cellular structure. First of all, a grid is considered on feature space. In fact, some random hyper-planes partition feature space into subpartitions called cells. Each cell is labeled by a class label based on covered training instances. The main problem is with tuning of initial hyper-planes.

1)
Normal: Ordinary samples, which are correctly classified at just one side of the hyper-plane. Their loss function is Hinge loss as defined in (1):
$$ \begin{gathered} l\left( x \right)_{{{\text{Normal}}^{{(\tilde{y})}} }} ~ = ~\max \left( {0,~1 - \widetilde{{y~}}\left( {w^{T} .x + b} \right)} \right) \hfill \\ {\text{where: }}~\widetilde{{y~}} = \left\{ { - 1, + 1} \right\} \hfill \\ \end{gathered} $$
(1)
where $\widetilde{y }$ is the virtual label of sample $x$ and determines at which side of the hyper-plane, $x$ is correctly classified.
2)
Negative don’t care: These samples are classified incorrectly on both sides of the hyper-plane. Their loss is defined in (2):
$$ l\left( {\varvec{x}} \right)_{{{\text{DontCare}}^{ - } }} = \max \left( {l\left( {\varvec{x}} \right)_{{{\text{Normal}}^{{\left( { + 1} \right)}} }} ,l\left( {\varvec{x}} \right)_{{{\text{Normal}}^{{\left( { - 1} \right)}} }} } \right) $$
(2)
3)
Positive don’t care: This group is the opposite of Negative don’t care, and samples are classified correctly on both sides of the hyper-plane. Their loss function is defined in (3):
$$ l\left( {\varvec{x}} \right)_{{{\text{DontCare}}^{ + } }} = \min \left( {l\left( {\varvec{x}} \right)_{{{\text{Normal}}^{{\left( { + 1} \right)}} }} ,l\left( {\varvec{x}} \right)_{{{\text{Normal}}^{{\left( { - 1} \right)}} }} } \right) $$
(3)

The Positive don’t care samples, which are always classified correctly, are ignored in this paper. The main reason is that the loss function in (3) is not convex. Therefore, the objective function is defined as presented in (4):

$$ \begin{aligned} \min \frac{1}{2}~\left\| w \right\|^{2} &+ C_{1} \mathop \sum \limits_{{x \in {\text{Normal}}}} l\left( x \right)_{{{\text{Normal}}^{{(\widetilde{{y)}}}} }} ~\\ &+ C_{2} \mathop \sum \limits_{{x \in DC^{ - } }} l\left( \user2{x} \right)_{{{\text{DontCare}}^{ - } }} \end{aligned}$$

(4)

The scalar values $C_{1}$ and $C_{2}$ control the balance between the structural and empirical error. In this paper, both $C_{1}$ and $C_{2}$ were experimentally tested and set to 1000.

The LMPL classifier optimizes each hyper-plane based on the introduced objective function with a convex optimizer. After some iterations, the model converges to some hyper-planes in order to classify samples of different classes, and extra hyper-planes that are not useful in the classification are removed. Therefore, regarding the distribution and the complexity of the decision boundaries, the complexity of the model is tuned by removing redundant hyper-planes, and an efficient large margin approach is obtained.

3.6 Unmasking the face

Given its contactless nature, especially in the pandemic era, using faces is preferred in biometric recognition. However, these systems are designed for non-occluded faces [9], the proposed method was designed to work based on existing face authentication methods and avoid retraining them on masked face datasets. Most of the recent works have focused on the eye area exclusively [10] or retraining existing methods on the simulated masked faces [11].

1)
Image segmentation

As the first step, faces were segmented into Mask and Non-Mask segments to determine missed parts of the face. Figure 3a illustrates an example of an input masked face and the resulting segmented face.

2)
Generating Synthetic Faces

A generative advertising network (GAN) was trained on 15,000 real-world faces without face masks. Then, 25 synthetic faces were generated by the trained GAN to complete the masked faces, as shown in Fig. 3b.

3)
Selecting the Matched generated face

The distance between a masked face and generated faces was calculated at pixel level based on the normalized root-mean-square error (NRMSE), which ranges from 0 (identical) to 1 (completely different). The synthetic face with the smallest value was selected to complete the masked face. An example is shown in Fig. 3c.

4)
Face Completion

Facial parts of the mask area were extracted from the selected synthetic face to fill missed parts of the masked face. An example of the final output of the proposed method is shown in Fig. 3d.

Algorithm 1 shows the whole process of the proposed EfficientMask-Net method.

4 Experimental results

4.1 Experimental setup

All experiments were implemented using the deep learning and image processing toolboxes of MATLAB R2021a. A CPU Core i7 4.00 GHz with 24 GB RAM was applied to implement the

MaskEfficeint-Net. The Adam optimizer [12] with $\beta_{1} = 0.9$, $\beta_{2} = 0.999$, and $\epsilon = 10^{ - 8}$ was also used. Moreover, weight decay of $10^{ - 4}$ for L2 regularization was applied to avoid overfitting.

The network was trained for five epochs with a mini-batch size of 64. The initial learning rate was set on $10^{ - 3}$, and the learning rate drop factor was set on $0.1$ for all three epochs to increase the learning speed. Besides, the training dataset shuffled every epoch.

In this study, two experiments were carried out for two different classification schemes:

I.
Experiment 1: Correctly Face Mask (CFM), Incorrectly Face Mask (IFM), and Not Face Mask (NFM) wearing
II.
Experiment 2: Uncovered Chin IFM, Uncovered Nose IFM, and Uncovered Nose and Mouth IFM

B.
MaskedFace Dataset

4.2 MaskedFace dataset

In this study, we combined the novel MaskedFace-Net^{Footnote 1} and the well-known Flicker-Face-HQ^{Footnote 2} (FFHQ) datasets. FFHQ is an open-access high-quality dataset with PNG images of $1024\times 1024$ resolution. The original FFHQ was used as the Not mask-wearing dataset. The details of class samples and the related experiments are listed in Table 1. Finally, 14,783 and 4992 face images were used in Experiments 1 and 2, respectively. The complete dataset for each experiment can be found in the Zenodo repository (https://zenodo.org/record/4892677).

Table 1 Details of face image dataset

Full size table

4.3 Experimental results and analysis

1)
Performance Analysis

Several lightweight deep networks were compared as an end-to-end network and a feature extractor with the novel LMPL classifier (called CNN⁺) in terms of different metrics, as shown in Tables 2 and 3 for both experiments. In both experiments, EfficientNetB0 achieved the best results in both schemes as an end-to-end network and a feature extractor with the LMPL classifier (EfficientNetB0⁺).

Table 2 Comparison of deep CNNs as an end-2-end network and as a feature extractor, along with the proposed LMPL classifier $({\text{CNN}}^{ + }$) in experiment 1: correctly Face Mask (CFM), Incorrectly Face Mask (IFM), and Not Face Mask (NFM) wearing

Full size table

Table 3 Comparison of deep CNNs as an end-2-end network and as a feature extractor, along with the proposed LMPL classifier $({\text{CNN}}^{ + }$) in experiment 2: uncovered chin IFM, uncovered nose IFM, and uncovered nose and mouth IFM

Full size table

The novel LMPL was also compared with well-known classifiers. According to the results, LMPL outperformed all other classifiers in terms of performance metrics. As illustrated in Tables 4 and 5, the LMPL achieved the best classification accuracy in both experiments.

Table 4 Comparison of well-known classifiers with the proposed LMPL classifier in experiment 1: correctly face mask (CFM), incorrectly face mask (IFM), and not face mask (NFM) wearing

Full size table

Table 5 Comparison of well-known classifiers with the proposed LMPL classifier in experiment 2: uncovered chin IFM, uncovered nose IFM, and uncovered nose and mouth IFM

Full size table

2)
Statistical Analysis

Friedman test is a popular statistical analysis for simple, nonparametric, and safe comparison of at least three-related samples. It has no assumption about primary data distribution. This test ranks methods for each metric independently. Indeed, $R_{j}$ is the average rank of the $j\,{\text{th}}$ method based on different metrics. Note that in the case of tie, i.e., identical performance, the same ranks are assigned.

As can be seen in Tables 2, 3, 4, 5, the novel LMPL improved the performance metrics significantly and obtained the best average ranks in all cases. These tables reveal the significant difference between the efficiency of the different methods.

3)
Visual Analysis

Gradient-weighted class activation mapping (Grad-CAM) technique [15] was used for detailed visual analysis, which provides a visualization of the extracted deep features through the fine-tuned EfficientNetB0, as shown in Fig. 6. Grad-CAM is a technique to interpret deep CNN predictions and check whether the CNN is focusing on the right parts of the input image. Prediction regions can be investigated using heat maps. The spatial parts with the greatest impact on the network score were identified by Grad-CAM heat mapping, as shown in Fig. 6. The standard jet map was used in which red and yellow indicate regions with high contribution to the right predictions and blue denotes regions with low contribution. As can be seen, the fine-tuned deep EfficientNetB0 well identified the effective regions in the classification predictions.

4)
Comparison with State-of-the-Art Studies

According to Table 6, the developed method showed superior performance in comparison to several recent studies. It can be concluded that the proposed Efficient-Mask Net can be useful in face mask-wearing monitoring systems, especially in public places, to control coronavirus spreading, as well as face authentication services in lightweight devices like mobile phones.

Table 6 Comparison of the proposed method with state-of-the-art deep models in face mask detection (CFM = correctly face masK-, IFM = incorrectly face mask-, NFM = not face mask-wearing)

Full size table

5 Conclusion and future work

The proposed EfficientMask-Net model is lightweight and needs low power resources. Hence, the method can be useful in real-time face mask-wearing systems to identify mask-wearing conditions in public places for epidemic prevention. Two experiments were conducted to evaluate the proposed method on various deep CNNs. The EffientNetB0 with the novel LMPL classifier showed the best average accuracy in both experiments, equal to 99.53 and 99.64%, respectively. The face unmasking was also performed on masked faces and showed promising results that can be useful in face authentication systems.

In the future, the proposed method can be extended to work on real-world masked face datasets. In order to improve face unmasking, the existing face completion methods under occlusion can be applied to masked faces. Besides, the impact of unmasking on present face recognition methods can be investigated.

Notes

see “MaskedFace-Net dataset” https://github.com/cabani/MaskedFace-Net
see “dataset of face images Flickr-Faces-HQ (FFHQ)” https://github.com/NVlabs/ffhq-dataset

References

Azouji, N., Sami, A., Taheri, M., Müller, H.: A large margin piecewise linear classifier with fusion of deep features in the diagnosis of COVID-19. Comput. Biol. Med. 139, 104927 (2021)
Article Google Scholar
Prasad, S., Li, Y., Lin, D., Sheng, D.: maskedFaceNet: a progressive semi-supervised masked face detector. In: Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pp. 3389–3398 (2021)
Fasfous, N., Vemparala, M.-R., Frickenstein, A., Frickenstein, L., Stechele, W.: BinaryCoP: Binary neural network-based COVID-19 face-mask wear and positioning predictor on edge devices. arXiv Prepr. arXiv2102.03456 (2021)
Cabani, A., Hammoudi, K., Benhabiles, H., Melkemi, M.: MaskedFace-net–a dataset of correctly/incorrectly masked face images in the context of COVID-19. Smart Heal. 19, 100144 (2021)
Article Google Scholar
Qin, B., Li, D.: Identifying facemask-wearing condition using image super-resolution with classification network to prevent COVID-19. Sensors 20, 5236 (2020)
Article Google Scholar
Dey, S.K., Howlader, A., Deb, C.: MobileNet Mask: A multi-phase face mask detection model to prevent person-to-person transmission of SARS-CoV-2. In: Proceedings of International Conference on Trends in Computational and Cognitive Engineering, pp. 603–613. Springer (2021)
Jiang, M., Fan, X., Yan, H.: RetinaMask: a face mask detector. arXiv Prepr. arXiv2005.03950 (2020)
Tan, M., Le, Q.: Efficientnet: Rethinking model scaling for convolutional neural networks. In: International Conference on Machine Learning, pp. 6105–6114. PMLR (2019)
Yu, J., Hu, C.-H., Jing, X.-Y., Feng, Y.-J.: Deep metric learning with dynamic margin hard sampling loss for face verification. Signal Image Video Process. 14, 791–798 (2020)
Article Google Scholar
Li, Y., Guo, K., Lu, Y., Liu, L.: Cropping and attention based approach for masked face recognition. Appl. Intell. 51, 3012–3025 (2021)
Article Google Scholar
Anwar, A., Raychowdhury, A.: Masked face recognition for secure authentication. arXiv Prepr. arXiv2008.11104 (2020)
Kingma, D.P., Ba, J.: Adam: a method for stochastic optimization. arXiv Prepr. arXiv1412.6980 (2014)
Selvaraju, R.R., Cogswell, M., Das, A., Vedantam, R., Parikh, D., Batra, D.: Grad-cam: visual explanations from deep networks via gradient-based localization. In: Proceedings of the IEEE International Conference on Computer Vision, pp. 618–626 (2017)
Nagrath, P., Jain, R., Madan, A., Arora, R., Kataria, P., Hemanth, J.: SSDMNV2: a real time DNN-based face mask detection system using single shot multibox detector and MobileNetV2. Sustain. Cities Soc. 66, 102692 (2021)
Article Google Scholar
Mercaldo, F., Santone, A.: Transfer learning for mobile real-time face mask detection and localization. J. Am. Med. Informatics Assoc. (2021)
Batagelj, B., Peer, P., Štruc, V., Dobrišek, S.: How to correctly detect face-masks for COVID-19 from visual information? Appl. Sci. 11, 2070 (2021)
Article Google Scholar
Militante, S. V, Dionisio, N. V: Real-time facemask recognition with alarm system using deep learning. In: 2020 11th IEEE Control and System Graduate Research Colloquium (ICSGRC), pp. 106–110. IEEE (2020)
Jiang, X., Gao, T., Zhu, Z., Zhao, Y.: Real-time face mask detection method based on YOLOv3. Electronics 10, 837 (2021)
Article Google Scholar
Loey, M., Manogaran, G., Taha, M.H.N., Khalifa, N.E.M.: A hybrid deep transfer learning model with machine learning methods for face mask detection in the era of the COVID-19 pandemic. Measurement 167, 108288 (2021)
Article Google Scholar
Zhang, J., Han, F., Chun, Y., Chen, W.: A novel detection framework about conditions of wearing face mask for helping control the spread of COVID-19. IEEE Access. 9, 42975–42984 (2021)
Article Google Scholar
Inamdar, M., Mehendale, N.: Real-time face mask identification using facemasknet deep learning network. Avail. SSRN (2020)

Download references

Author information

Authors and Affiliations

Department of Computer Science and Engineering and IT, School of Electrical and Computer Engineering, Shiraz University, Shiraz, Iran
Neda Azouji, Ashkan Sami & Mohammad Taheri

Authors

Neda Azouji
View author publications
You can also search for this author in PubMed Google Scholar
Ashkan Sami
View author publications
You can also search for this author in PubMed Google Scholar
Mohammad Taheri
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Ashkan Sami.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Reprints and permissions

About this article

Cite this article

Azouji, N., Sami, A. & Taheri, M. EfficientMask-Net for face authentication in the era of COVID-19 pandemic. SIViP 16, 1991–1999 (2022). https://doi.org/10.1007/s11760-022-02160-z

Download citation

Received: 04 July 2021
Revised: 20 November 2021
Accepted: 21 January 2022
Published: 21 April 2022
Issue Date: October 2022
DOI: https://doi.org/10.1007/s11760-022-02160-z

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

EfficientMask-Net for face authentication in the era of COVID-19 pandemic

Abstract

Similar content being viewed by others

Untargeted white-box adversarial attack to break into deep learning based COVID-19 monitoring face mask detection system

Performance Evaluation of CNN Models for Face Detection and Recognition with Mask

An improved deep transfer learning approach to identify the human face mask in real-time considering the COVID-19 pandemic

1 Introduction and motivation