1 Introduction

The spread of Coronavirus disease 2019, commonly known as COVID-19, is a significant concern for everyone worldwide. It is a contagious disease that has affected human life globally [108, 117]. The health specialists suggest that the virus might transmit by direct or indirect contact with the infected person [119], hence measures like compulsory wearing of face masks [40], as illustrated in Fig. 1, have been strictly put into effect by medical bodies. Numerous studies advise putting face masks on even if a person is not feeling sick. It is not the first time, during COVID-19, that wearing face masks has been stressed to combat the transmission. It is a practice that can be dated back to the 1910–11 Manchurian epidemic in China [60]. Various pandemics of history have been survived by wearing face masks. Besides, it is well proven by various studies that not just wearing face masks instead wearing them properly limits the transmission of the virus to quite an extent. The observation that greater the proportion of population wearing face masks in a country, the lesser the cases of COVID-19 in the nation has created the need for an automated face mask detector.

Fig. 1
figure 1

Precautions to avoid COVID-19

Further, the coronavirus pandemic has necessitated the scientific contribution across the globe to help in battling the pandemic. Leveraging the contemporary technical advancements. Numerous solutions to prevent the transmission of the virus have been formulated. As observed in [71], the authors have put forward an updated mask detection architecture working with noteworthy efficiency of 97%. In [5], the spotting of face masks involved PyTorch, with results being 97% accurate. Further, [95] proposed the detection of several kinds of masks using ultramodern method, and also, the output was obtained after applying the model in real-time. CNN based detectors have been used on custom collected face mask datasets in [21]. Another study was performed to formulate an application that inspects people wearing face masks in public areas [31]. Additionally, the already existing dataset was enriched by including more images in [79]. The proposed work used the Faster R-CNN model to implement the task and achieved an accuracy of 99.8%. In [30], the authors have put forward a system of verifying the correct position of the face mask of an individual, while [72] includes discussions on the various technological methods available to deal with the virus.

With the advancements in technology that the world has been witnessing, there are various available techniques [7, 48, 74, 76, 113] that could prove valuable to society if used effectively. A real-time system which could itself classify, seeing a person, in two categories [77]:

  1. 1)

    A person wearing a face mask

  2. 2)

    A person not wearing a face mask

could be useful in recent times. Such systems could find applications in public areas like hospitals, airports, malls, etc. One of the methods to make the detector is by first detecting the faces in real-time. And, after detecting the faces from the webcam stream, saving the frames containing the faces and next applying a classifier. The numerous algorithms that could be used for categorization have been discussed in the subsequent section. Another way that could be opted to execute the same is by using an object detection model. Following are the contributions in view of the current state-of-the-art.

  • Although several precautions are recommended to get safe from covid-19, still face masking, and social distancing are significant factors. So, it was necessary to propose many face masking techniques under one umbrella for the research community.

  • Pertaining to the need of the current time, the proposed work reviews several studies conducted in the field of face mask detection. The strong suit of plenty of publications has been discussed on face masking, which is still missing in terms of observations, future trends, a vast number of references, current trends, etc.

  • Performance parameters of several algorithms are compared, and discussions on them are presented to increase the efficacy of the review paper.

1.1 Motivation and trends in recent years

With time, the surge in COVID-19 cases urged people to be cautious, alert, and take all safety measures possible. In situations such as this, where a mere sneeze could be harmful to many people, safety remains the priority. To ensure the well-being of all humans, a system that could itself monitor if a face mask is on or not is necessitated. It would not only secure a being rather fellows in the vicinity as well. Having access to the ultra-modern technological methods, implementing such a system could be a boon to society.

After analyzing the problem statement, numerous studies performed on the same were scrutinized to commence the research. Then, the content relevant to the issue was filtered, and a depth understanding of the topic was attained. Further, several existing datasets were explored, along with the techniques available. The literature survey of the available methods was conducted, followed by a comparison of the different algorithms. Further, the software was explored and thereby applications. Eventually, the future scope was inspected as shown in Fig. 2.

Fig. 2
figure 2

Methodology used

Initially, around 180 papers were identified belonging to varied publications like Springer. Later, the collected documents were checked for duplication and removed, if any. Then, the articles were screened for their eligibility in context with relevance to the problem statement and thereby, leaving just about 140 papers. Further, the papers were assessed for quality, bringing down the count to 130. Besides, around 100 papers were analyzed for understanding the various techniques available, including state-of-the-art. Few more publications were investigated to gather knowledge about the available datasets.

The paper’s organisation is as follows: Section 2 deals with the general flow chart of the face mask detector. Section 3 discusses the various techniques that could be used to implement a face mask detector, while Section 4 reviews some of the real-time methods. Section 5 analyses the trends of techniques in the last two decades along with the advantages and challenges of the techniques discussed in Section 3. In Section 6, the URLs for multiple online available datasets are mentioned. Section 7 suggests several useful software that could be used to carry out the process, followed by Section 8 that states the use cases, drawbacks and the observations made for the process. Section 9 provides conclusions of the study along with future directions.

Figure 3 illustrates the number of publications in face mask detectors in the last two decades. Owing to COVID-19, such detectors have gained to be a hot topic of study in 2020 among researchers.

Fig. 3
figure 3

The number of publications in face mask detector from the year 2000 to 2022(The year 2022 includes data till January 11) as taken from Semantic Scholar using words “Face Mask Detector”

2 General flow chart

The implementation of the face mask detector system could be executed in two phases, as shown in Fig. 4.

Fig. 4
figure 4

The proposed flow diagram for face mask detection system

The first phase is the training phase. This stage is initiated with the collection of the dataset. One of the most crucial steps is to have a good quantity and quality of data [1]. One can prepare the dataset or use already existing datasets from the various available sources. If preparing yourself, the size of data could be increased by using techniques like data augmentation. Also, the data has to be cleaned before use because it plays a significant role in building a model. Various Steps involved in data cleaning are shown in Fig. 5. After obtaining a good quality dataset, the model is selected under the system’s demands and trained on the chosen dataset. Multiple techniques could be used to accomplish the target.

Fig. 5
figure 5

Steps involved in data cleaning

By acquiring the most suitable trained model, the first phase comes to an end. In the subsequent step, the frames from the live video feed or the images are used as input to the trained model. The live video feed could be obtained using a mobile phone, a camera, or a surveillance camera and hence could vary in format, i.e., H.265, H.264, etc.

There are several cases where the video frame cannot capture the images as desired. There is a possibility of the video recorded being blurred or having noise, etc. In scenarios like these, image pre-processing comes to the rescue. Further, there are several methods in OpenCV that could be used to enhance the quality of the image. For instance, blurriness could be reduced using the filter2D function of OpenCV, which enhances the sharpness of the picture. Also, image denoising techniques of the same library are helpful to deal with noisy images. Various transforms or histograms could be used for the same. Additionally, object tracking could also be considered to detect faces. Though these are the ways to deal with the discrepancies, the target should be to capture good quality videos (Fig. 6).

Fig. 6
figure 6

Video pre-processing techniques

3 Face mask detection techniques

Some of the several techniques used in face mask detection are discussed below (Fig. 7):

Fig. 7
figure 7

Approach for object detection methods

3.1 Object detection

Deep Learning techniques have managed to pick up steam currently because of their ability to train vast data with high accuracy [102]. These state-of-art methods prioritise accuracy in some cases whereas speed in others. In place of the advantages of deep learning techniques in a real-time application, this section discusses object detection using the deep learning approach [19, 29, 42, 46, 109, 114].

At the hands of Computer Vision, Object Detection works to identify and locate objects of certain classes in images and videos. This is imitated in Fig. 8. Besides, this technique uses bounding boxes to localize the things in the input image. This can also enumerate the number of objects in the given image. Various object detection algorithms are available lately [37, 41, 121]. They are categorized into [92].

  • Two-Shot Detection

  • Single-Shot Detection

Fig. 8
figure 8

Object detection segments

3.1.1 Two shot detectors

This model achieves the target in two steps: Region proposal followed by classification of those regions and refinement of location prediction. Various models for this category are:

  • Faster Region-Based Convolutional Neural Network

It is the improvised model of earlier proposed R-CNN [91] and fast R-CNN. It comes with better region-based CNN architecture [25]. Moreover, it is one of the extensively employed advanced algorithm with the R-CNN backbone. Compared to earlier models, it replaces the selective search algorithm used to identify RoI. The detailed diagram explaining the same is shown in Fig. 9. Additionally, when accuracy is of concern, this algorithm is given preference. In [82], the author performs company logo detection using the mentioned technique. Also, in [22], this algorithm is used to identify the stages in malaria-infected blood. In [39], the author uses this state-of-art model to monitor people wearing face masks in public areas. Furthermore, several researchers [6, 14, 63, 87, 94, 103, 115] have taken leverage of this method.

Fig. 9
figure 9

Faster R-CNN [33]

  • Region-Based Fully Convolutional Network

It is a two-shot architecture that is developed, taking inspiration from Faster-RCNN. Unlike Faster R-CNN, all the composite work is finished before ROI pooling, which is applied on score maps. All regional proposals utilize the same score maps to perform average voting. Also, all the layers are convolutional and computed on the image. It can be taken as a hybrid model of one-shot and two-shot models. The architecture is shown in Fig. 10. Besides, the related works are talked over in [15, 54, 106] closely.

Fig. 10
figure 10

R-FCN [34]

3.1.2 Single-shot detectors

They are usually used when speed is a priority to implement a study. This is because of their method to predict the boundary boxes and the classes, which does not involve a dedicated step for the proposal of bounding boxes and utilizes a single deep neural network. Therefore, they find numerous applications in real-time detections.

  • You Only Look Once

Unlike selecting an image in parts, the algorithm performs categorization in a single pass. The input image is made to pass through multiple layers of the network which eventually produces a prediction as an output [62]. Moreover, Yolov3 makes use of DarkNet-53 to detect features. DarkNet-53 is a 53 layers CNN trained on ImageNet. It even uses Residual networks, which skip connections [80]. Besides, anchor boxes are used as a pre-trained landmark by the bounding boxes to provide the detected object location. Again, it predicts the class probabilities for each grid cell. In this model, the Non-max Suppression algorithm finds usage to eliminate anchor boxes that are not required. The bounding boxes are discarded using IoU (Intersection over Union) (Fig. 11)

Fig. 11
figure 11

Working of Yolo [32]

Further, YOLO has gained attention because of its speed [58]. Moreover, its excellence in learning even on the generalized images of the objects and making predictions with high accuracy aids it is outperforming other fellow models. In [85], the author has enhanced the traditional Yolov4 series to propose a novel detector. Likewise, in [11], this state-of-art technique has been implemented to improve the performance of mask detectors. Also, a similar approach is elucidated in various [2, 8, 38, 43, 49, 52, 55, 57, 83, 88, 98] compositions.

  • Single shot multibox detector

It uses VGG-16 as its backbone architecture, discarding the fully connected layers [12]. The model can be set up in two components, i.e., extraction of feature maps, followed by application of convolution filter in order to detect objects. It works by matching objects with default boxes of distinct aspects. Whenever any box meets the set minimum threshold value of IoU, a match becomes considerable. Besides, after approximation, each feature map location is scaled, and the predictions by the model are made by feature maps to consider objects of multiple sizes as shown in Fig. 12.

Fig. 12
figure 12

Single Shot Multibox Architecture [35]

In [65], real-time face mask detection is discussed with changes in architecture used. [68] provides a way to execute the algorithm. Also, [81] talks about the model used in detecting objects for the blinds. Further, a different approach is used in [23] for object detection. In [17], an improvised way of detecting face masks using SSD has been executed. The authors have improved the algorithm by using inverse convolution and feature fusion. While [53] brings up a similar technique for executing their study.

It can be observed from Table 1 that single shot detectors, including YOLO and SSD, have higher inference speed owing to faster localization and categorization followed by Faster R-CNN. Additionally, the algorithm to be used is chosen depending on the requirement of the problem. Generally, Faster R-CNN, because of the detection speed, is employed when the results are not to be obtained in real-time, whereas YOLO is the choice of practitioners when working with live data feed. Also, SSD maintains a balance between speed and detection effectiveness.

Table 1 Comparison table of state-of-the-art detection models [78]

3.1.3 Feature extraction

Extraction of features is a way to get rid of unnecessary information from the data, thereby reducing the computational cost and still having imperative and relevant data reserved. Also, the reduced data helps increase the model’s learning rate. Moreover, real-time face mask detection leverages machine learning and deep learning techniques for feature extraction. In deep learning, neural networks themselves facilitate extracting features without human intervention. The input data is passed to the feature extraction network, with different backbone architectures, including MobileNetv2 and Xception [71]. Subsequently, the result is forwarded to the classifier network categorizing a person with or without a mask. On the other hand, algorithms, like histogram of oriented gradients (HOG) and Principal Component Analysis (PCA), could be utilized to obtain features in the machine learning model [29, 71]. Additionally, features could be extracted manually by incorporating the methods mentioned in Fig. 13.

Fig. 13
figure 13

Features extraction techniques in face mask detection [12, 15, 42, 93]

3.2 Other techniques

Diversely, another path that could be taken to execute the study is by considering the problem in two sections. The problem statement, here face mask detector, could be constructed by first performing face detection on the frames coming from the video feed and later giving the frames with faces as an input to the classifier, which hence furnishes the desired output, i.e., faces with or without masks (Fig. 14).

Fig. 14
figure 14

The Face Mask Detection could be implemented by first performing face detection followed by face mask classification on an individual

Elaborating on above-mentioned points, FACE DETECTION is a technical advancement in the contemporary world where human faces could be detected in an image. The location of the face is marked using bounding boxes. Also, numerous aspects are to be considered to perform successful detection [51]. Due to the advantages of neural networks, even they are used in detection [104]. The innovation is in use in various applications. Some of the different methods to perform the same are listed below:

  • Dlib

Dlib performs face detection using deep learning through Convolutional Neural Networks. It performs better than HOG based method even on the faces at odd angles. A delicate implementation of the library is well illustrated in [86, 111].

  • Multi-task Cascaded Convolutional Neural Network

A CNN-based proposed works in three different stages to detect and localize faces and vital facial points. [120]. Besides, [110] conducted facial recognition using MTCNN. In [28], the real-time application of detecting people with or without face masks using the mentioned method is illustrated. Likewise, a detailed study is executed in [50].

  • RetinaFace

It is a single-stage detector that works on pixel-wise face localization and simultaneously predicts face box, face score, and facial key points. An elaborate discussion is presented in multiple pieces of research [16, 26, 69].

3.2.1 Performance analysis

From the analysis in Fig. 15, it can be observed that all the algorithms perform efficiently on images. However, some studies maintain the poor performance of dlib in scenarios with a lot of faces in it. While analyzing the performance of the different methods, the quality of the image should be considered. Also, the model’s accuracy varies with the angle of the face in an image, as studied in [64].

Fig. 15
figure 15

Face detection speed analysis [64]

Although the effectiveness of the architecture can be influenced by the size and the quality of the dataset, there are precisely defined parameters used to assess the classification outcomes. Precision and recall are the evaluation metrics to check the performance of the model. Additionally, precision is taken to be the measure of correct positive identifications while recall represents the proportion of correctly classified actual positives. The closer the value of precision and recall is to 1, the more accurate is the used backbone network. From Table 2, Dlib based on ResNet50 has the precision and recall value closest to 1, in comparison to other algorithms, thereby conducive to an effective model.

Table 2 Detection accuracy comparison of algorithms

After successfully performing face detection, the next step to classify the faces detected is carried off. CLASSIFICATION is considered supervised learning in machine learning [90], which specifies the class label to which the input data belongs. The methods that can be used to perform the same are considered below.

  • Convolutional Neural Network

In deep learning, a CNN model is usually fed with an image as an input which is then made to pass through multiple layers [3]. To begin with, the input is made to pass through convolutional layers with kernels in succession, followed by a pooling layer. This layer then reduces the number of learning parameters and hence computations by turning down the size of feature maps. It is afterwards carried through fully connected layers, which at the end apply a softmax function that predicts the probabilistic values for each class. The class having the maximum value is then taken to be the class to which the object belongs.

CNN can make use of varied backbone architectures to achieve the task. In [13], the VGG-16 architecture of CNN is discussed. Further, a real-time face mask detector which could be helpful in times like those of COVID-19, is demonstrated in [27]. Besides, [75, 93, 97, 99, 122] analyses the usage of the technique.

  • Support Vector Machines

It is a method leading to the division of the input data into different classes by making boundaries using hyper-planes. When working on multi-class data, each class is considered to have its binary classifier. [59] describes and exhibits how SVM is used for image classification. It uses SVM on several datasets and later even compares the performances on each dataset and with multiple other classifiers. Also, discussion about similar aspects is done in [45, 123].

  • Decision Trees

It is among the most useful algorithms that are availed to deal with classification problems. It is a flow-chart-like structure where each internal node tests on a feature, and the branch represents the test result while the leaf node represents the decision, i.e., class label [24]. In [73], decision trees and their specific algorithms are reviewed in depth. Correspondingly, [100] talks about work in the same domain.

  • Ensemble

This type of learning produces an optimal predictive model because it combines several other models. The model works either by bagging or by bootstrap aggregation. [4] reviews about the available hybrid and Ensemble methods in detail. Besides, an assessment of the process is described in [20].

The accuracy comparison chart, as shown in Fig. 16, analyses the result of several algorithms obtained on the Simulated Masked Face Dataset (SMFD) as studied in [67, 75]. Although it can be observed that SVM has achieved the highest possible accuracy, it cannot be neglected that the other components, like the selection of hyperparameters, play a crucial role while deciding the feasibility of an algorithm. The amalgamation of architecture, dataset, pre-processing, and requirement of the problem statement result in selecting the technique to be used.

Fig. 16
figure 16

Comparison of various classification algorithms

4 Analysis of real-time techniques

The comparison of different contemporary real timedetection techniques has been shown in Table 3.

Table 3 Detection accuracy comparison of algorithms

5 Face mask detection techniques analysis

In terms of the approach being used for the execution of tasks, Fig. 17 demonstrates the growth of each method since the year 2011. It can be observed that deep learning has gained much attention freshly. Also, the data has been collected using Semantic Scholar.

Fig. 17
figure 17

Deep Learning and Machine Learning usage trends over the years 2011–2021(till April’2021)

Figure 18 shows the comparative percentage usage of reviewed techniques in articles available on different e-sources from 2000 till 2021(April). The articles in Fig. 17a have been selected from Semantic Scholar using the keywords “technique name” + Face Mask Detection. Further, Fig. 17b depicts the articles chosen from Semantic Scholar using “technique name” + Face Detection. Whereas for Fig. 17c, keywords “technique name” + Classification were utilized.

Fig. 18
figure 18

Records of the different (a) Object detection (b) face detection (c) classification techniques analyzed over the year 2000–2021(April) using Semantic Scholar

5.1 Popular techniques with advantages and challenges

A single algorithm cannot suffice for all the needs. The choice of the algorithm relies on many factors. The specific parameters that rule the decision-making include the size of training data, speed, accuracy, training time, number of features, etc. None of the models can be declared best among the counters, but a comparison can be put together to help in the choosing process [101, 105] (Tables 4, 5 and 6).

Table 4 List of advantages and disadvantages of some of object detection algorithms (deep learning approach)
Table 5 List of advantages and disadvantages of various face detection methods
Table 6 List of advantages and disadvantages of various classifiers

6 Dataset

It is a collection of instances used to train models for learning. It can either be created by scraping from the internet or accessing various online websites [107]. Few of the sources that are currently available on different sources are shared in this article (Table 7).

Table 7 List of different datasets available on online platforms for the study

7 Several supporting software

These days, there is a plethora of programming languages, programming tools, libraries, and frameworks to choose from while working on a project. Further, there are no stringent rules to choose from numerous such sources. Nonetheless, the article lists specific tools that could be useful in a study.

The basic requirements for completing a face mask detector project are illustrated below (Fig. 19)

Fig. 19
figure 19

Requirements for face mask detection system [36

7.1 Dataset

A rich and relevant dataset can be accessed using the below-mentioned methods:

7.1.1 Data collection

It involves accumulation of content pertinent to the problem situation at hand. It is usually performed in accordance with the task to be executed. There are various methods available that could be used to prepare one’s dataset. Some of the tools that could be used for the purpose are shown below (Fig. 20).

Fig. 20
figure 20

List of tools used in data collection [36

7.1.2 Annotating image

One of the essential steps while dealing with the image dataset is to annotate it. It refers to labeling images to be later utilized in the machine learning model. Lately, various approaches are viable to execute the same. Some of them are (Fig. 21):

Fig. 21
figure 21

List of tools used in Image Annotation [36] *The dataset could be enriched by deploying techniques like data augmentation

7.2 Model

Below are libraries and frameworks typical of the different implementation techniques mentioned above. They can be installed under the demand of the action, and the model used. To add, the desired file can be imported from the concerned library (Fig. 22).

Fig. 22
figure 22

List of useful Python libraries [18]

Since, while working with model creation, open-source libraries and frameworks play a significant role. Figure 23 reviews the ranking of the numerous libraries consistent with the GitHub star count as reported by the official documentation of the respective library on PyPi till April 2021. The assessment could be helpful for the uninitiated to begin working with such user-friendly libraries.

Fig. 23
figure 23

Popularity of useful python libraries based on statistics of GitHub stars (till April’ 2021)

7.3 Python

Some of the other useful open source libraries that can be amalgamated with the essential packages are talked through in this section (Fig. 24).

Fig. 24
figure 24

List of other supporting libraries [18]

8 Applications, limitations, and observations

Certain areas where face mask detection can be effectively employed are discussed below.

  • Transit hubs

At places like airports, railway stations, etc., face mask detectors, integrated with security cameras, can be implemented to keep a check on travellers wearing face masks or not. The passenger’s face could be detected throughout the premises, and the authorities could be informed immediately if any violation is detected.

  • Workplaces

A mechanism to observe if an employee has worn a face mask or not could be incorporated in an office. A warning message could be sent to people who are not following the safety precautions. Also, a daily record of people not complying with the regulations could be maintained.

  • Healthcare centres

In various healthcare organizations and hospitals, a face mask detection system could track health workers wearing face masks during their shifts. Besides, it could be helpful in alerting the visitors entering the site without face masks. The officials could be immediately informed in case of defiance.

  • Surveillance systems

Utilizing face mask detection systems unified with surveillance cameras can help strictly track people wearing face masks or not in public areas.

8.1 Limitations

Although the system performs efficiently in real-time, it faces the following challenges.

  • Although different network architecture performs better in mask detection tasks, the model suffers limitations due to large dataset performance [65].

  • The irregularities in images, like those with insufficient light and side angle, need proper attention [116].

  • Also, another major challenge is to achieve high accuracy in the least possible time [97].

  • Additionally, the video analysis has difficulties, including motion blur, transitioning between frames, etc. [64].

8.2 Observations

  • Although two-stage detectors excel in accuracy, one-stage detectors outperform them when used for real-time requirements. Hence, for real-time video feed detection use of algorithms like YOLO, SSD is appreciable.

  • Since training a deep neural network is expensive as it involves high computational complexity, transfer learning, i.e., utilizing pre-trained models like MobileNet, VGG-16, etc., is recommended.

  • Owing to the exceptional results that deep learning models produce, they have become the choice of various practitioners. Though they perform efficiently with high accuracy, applying disparate backbone architectures with different hyperparameters could result in even better accuracy.

  • Also, poor images, like insufficient light, side angle, etc., in the dataset have affected the performance of the model. Hence, the dataset’s quality could be improved further for future use.

  • Though there have been many studies and research work dedicated to COVID-19 these days, there is still a scope for a lot more analysis that could be done in the healthcare domain.

After reviewing many studies, it can be inferred that despite the variety of techniques being available to implement the model, one-stage object detectors are the preferred choice for real-time requirements. The accuracy with which it works in real-time makes the application possible. Also, because of the computational costs, drawbacks could be dealt with by altering the architectures, hyperparameters, input size, etc.

9 Conclusion and future directions

To deal with the pandemic more effectively, developing central systems capable of automatically detecting whether a person is wearing a face mask or not has become an engaging topic for people working in this sphere. A countless number of researches have been initiated lately in this domain. However, this paper aims to provide a detailed review of the various ways that could be opted for executing such an advanced system. After inspecting all the implementation techniques, it could be safely stated that deep learning has become popular among researchers in recent times. The efficiency of the approach makes it suitable for use in such tasks. Additionally, despite many datasets being available, the RMFD dataset is widely used. If used constructively, the deployment of the model could be beneficial in public areas. The proposed system could be upgraded for future works by integrating them with automated thermal detection systems. Also, a check on whether social distancing is being practised in crowded areas could be an add-on to the system. A feature of facial landmark detection could be added for biometric purposes. Moreover, owing to the versatility of the state-of-art techniques, their architectures could be enhanced to achieve better results at a faster speed. As shown in Fig. 25, there has been an upsurge in the usage of deep learning methods. Taking advantage of the enormous utility of these methods, various future studies could be executed in this domain. The quality of datasets could be improved by removing images with insufficient light. Nonetheless, the system could be integrated with a model to check if sufficient physical distance is being maintained between people. It could also be blended with a design that detects the mask type of a person. Besides, new feature extraction techniques could be explored using machine learning algorithms.

Fig. 25
figure 25

Upsurge of deep learning from March 2013 to August 2021 (Created by Google Trends)