1 Introduction

In any learning program, the full attendance of participants is vital to ensuring that the objectives of the program are achieved. This is closely related to the teaching process in educational institutions, where full attendance is very important to ensure that no student is left behind. Per Jacksi et al., [6], a minimum percentage of class attendance is required in most institutions. However, such policies are often not complied with due to the various challenges faced by current attendance recording methods. The conventional process in recording student attendance is to collect each student’s signature manually. Besides wasting time in getting signatures, as well as in the process of preparing the form hardcopy, other visible disadvantages include potential loss or damage to the sheets. Numerous systems have been developed for attendance measuring purpose. The use of clickers and swipe identification cards are among the methods used in tracking student attendance.

In the same context, many projects have been carried out. Among them is the RFID-Based Student Attendance Management System [1]. RFID can be used to record student attendance, the main purpose of which is to reduce the time wasted in recording attendance. There is also an Android-based smart student attendance system [4], which has proposed as a mobile online solution to record attendance faster, cheaper, and with automated attendance reports. Another RFID system integrated with face recognition [5] has also been proposed to track approved and counted students as they enter and leave the classroom. The system as mentioned above is very useful, taking into account the current trend in which Android is increasingly deployed. Other Android-based systems have also been introduced by Sunaryono et al., [14], in which long lines can be avoided in recording previous automated presence processes.

Referring to the validity of the data collected, the methods described above still have loopholes which allow abuse to occur. Since a person’s actual presence is very important, an important method that can be used to detect the actual presence is face recognition. Usually, this face recognition process [13] is subject to biometric verification when used for security purposes. Biometric data is unique to each individual and can thus be used to identify each individual. Face recognition several advantages when compared to traditional card recognition, fingertips, or iris recognition. Among them is being able to detect authenticity, user-friendliness, and no need for physical contact. From security to education, every field has begun to adopt biometric technology for identity tracking applications.

Many face recognition applications have been developed, but most must be downloaded and installed first to the user’s device before they can be used. Therefore, to avoid the burden of users like this, in this paper we recommend a face recognition system that can be accessed directly without having to download and install new applications. The idea is to develop an online application that runs on a web server and is easily accessible through a web browser from any terminal.

Referring to the latest scenario with the emergence of the Covid-19 global pandemic declared by the WHO [7, 9, 12], there is a serious need for security measures, with face masks being one of the main efforts to overcome this pandemic. To ensure the safety of the public within the scope of our focus, plans or strategies must be made so that we can monitor participants’ compliance with these basic safety principles. As this project is carried out due to concerns about the methods used by the organizers or lecturers to keep track of attendance to the program, this proposed system will be implemented to integrate two features, namely face recognition for attendance recording and face mask detection [8] to check for safety compliance. Concerning features for face mask detection, one existing system identified is the mask detection system [2], which was built based on YOLOv3. This work is very relevant to what we propose in this study considering the norm of wearing a face mask, which is needed during this pandemic season to prevent the spread of the Covid-19 virus. In this study, after taking into account the need to recognize faces and detect masks, as well as the need to optimize time in recording attendance, the face mask detection process will use the same face-recognition mechanism, but with some improvements in terms of data pre-processing.

Based on the issues discussed above, as well as some of the aims we want to achieve, the overall focus of this project is on developing an online attendance recording system based on face recognition and face mask detection. The face recognition process is required to identify the user’s identity for recording attendance, while the detection of the face mask is to identify safety compliance by the user. Incorporating these two features into one system allows the system to detect a person’s identity, regardless of whether they are exposing their entire face or wearing a face mask. The system is developed as a web-based application, which can offer easy access to users via a web browser from any terminal to record their attendance without having to download or install a specific application.

2 Methodology

In this section, the overall design of the system will be presented, together with the flow of activities that take place in the system. Overall, the designed system consists of two main components, as illustrated in Fig. 1, namely the server application and client-side application. The server application will be presented first in the next subsection, followed by the description of the client-side application in the following subsection.

Fig. 1
figure 1

Overall System Architecture

2.1 Server application based on Python

The first step in the development of this server application is to train a model able to recognize faces. For this purpose, facial images of all users need to be collected and saved for model training purposes. Since this system has a special feature for face recognition behind face masks, two datasets are required. One dataset will contain all the original user faces without wearing face masks, while another dataset is synthetically processed by applying a face mask for each sample image available. Thus, for each individual, there will be two sets of sample images that will be used, namely original face sample images and original images that are synthetically applied with face masks, respectively (refer to Table 1).

Table 1 Comparison between an original data sample and the generated synthetic data for facial identification and face mask detection

10 to 30 original face images were collected for each individual. All these images are saved in a specific folder for each individual. This folder is named according to the user’s identity. For example, a folder named ‘Haikal’ will contain all sample images of an individual named Haikal. As for the synthetic dataset, another folder called Haikal_Mask will contain original Haikal’s images that have been synthesized with the application of virtual masks.

The objective of collecting sample images in these two different categories is to train the model for identifying the individuals’ faces with and without face masks. This way, if a user performs identification using a face mask, then both identity recognition and detection of mask-wearing will be carried out.

In regard to recognizer model training, a special Python program will be run to carry out the model training process until it can perform face recognition based on the existing dataset. This trained model will be invoked by the Python program which will perform face recognition and provide output in the form of a recognized user’s identity or a label. For each user, two possible labels can be identified by the model, i.e. Haikal or Haikal_Mask. If the identification is made in the name of Haikal only, it means that the identified user (Haikal) is not wearing a face mask, while if the label given is Haikal_Mask, this means that the identified user (Haikal) is wearing a mask.

2.2 Client-side application

To use this system, all the user has to do is open the URL of this system through a browser. On the front page of this site, the user will be able to activate the camera to take a selfie. This web page is created using HTML, JavaScript, and CSS that can capture the selfie image and subsequently send the image to the server for processing after a click of a button. The server application that receives the image will activate a Python script that will perform a face recognition process to determine the identity of the face owner. The two main outputs produced by this Python script are the name of the identified face owner and also the status of whether the person is wearing a mask or not. The results will be displayed on the web page in the form of a processed selfie image printed with output labels.

Figure 2 illustrates, in general, the process of how a face recognizer model is trained. Initially, a set of data samples was collected for each individual, which included 10–30 facial images without a face mask. Then, a new collection of synthetic data samples is generated for the same individual by virtually applying a mask on each of the original images. The original data samples are saved in a folder and labeled with the individual’s name, while another folder storing the synthetic data samples is named with the individual’s name and appended with a mask label at the end. Both of the data samples for each individual will be used to train the model to recognize the individual’s face and whether he or she is wearing a mask or not. Once ready, this trained model will be used by a Python script that performs facial recognition to produce the outputs in the form of the identified individual name and status (wearing a mask or not).

Fig. 2
figure 2

Face-recognizer model training flow chart

Figure 3 illustrates the flow of activities that occur when a user begins to open the system’s main URL in a browser. To record attendance, the user will initially take a selfie via webcam and ensure that his or her face is taken in full. Then the user will click a button to upload the captured image to the server for processing.

Fig. 3
figure 3

The overall flow of activities

The application server that gets the image will then activate a Python script that can recognize the face from the image. The Python script will return the output in the form of an individual name identified through the images, as well as mask-wearing status. This information will be used to record attendance in the database. The final result on the webpage that will be displayed to the user is in the form of a selfie picture that has been labeled with face recognition information and mask-wearing status.

2.3 Python’s codes for model training and facial recognition

This online attendance system performs face recognition with face mask detection driven by a Python program. For the overall process, three stages of Python codes [10] will be run so that the recognition process can take place. The extract embeddings code will be run first followed by the train model code and lastly the recognition code. Each of these codes will be elaborated on in the subsection below.

2.3.1 Extract embeddings Python code

Figure 4 shows the codes used for creating a 128-D vector representing a face using a deep learning function extractor [10]. To construct embeddings, all faces of a user in the dataset will be transferred through the neural network. The ‘imagePaths’ variable contains the path to each image in the dataset. Then, the embeddings and corresponding names will be held in two lists which are known as ‘knownEmbeddings’ and ‘knownNames’. The variable called ‘total’ explains how many faces had been processed.

Fig. 4
figure 4

A snippet of codes to extract embeddings [10]

2.3.2 Train model Python code

Upon completion, based on the extracted face embeddings from the dataset, the face recognition model training will commence by running the Python codes [10] as shown in Fig. 5. The training process starts where the SVM model will be initialized and the model training will begin. This model is then exported and the encoder will be labeled to disks as pickle files after training the model. The entire process will allow the trained model to be applied in recognizer codes to process input images for facial recognition.

Fig. 5
figure 5

Codes used in training face recognizer model [10]

2.3.3 Facial recognizer codes

Code for facial recognition [10] through input images has been employed. Generally, this code includes the process of detecting faces, extracting embeddings, and querying the SVM model so that it can determine who is in the image. Figure 6 shows a snippet of the modified code, where the input image is processed to perform the face recognition process. Since the original input image is in RGB format, it must first be converted to a grayscale form. This is to reduce information unnecessary for face recognition, such as color and light effects. In this context, an analysis has been made to compare when face data is trained in RGB and grayscale forms. For RGB image recognition, accuracy is around 10% to 50%, while for face recognition through grayscale images, accuracy is around 20% to 70%. This proves that the percentage of recognition accuracy in grayscale form is better than face recognition based on RGB image.

Fig. 6
figure 6

A snippet of codes to recognize a face

2.3.4 Facial mask detection

We have explored two methods for the additional process of detecting face masks. The first approach is to employ a separate code from face recognition, namely a “face mask detector” code from Rosebrock, [11]. A code snippet is shown in Fig. 7. In this first approach, the facial recognition and mask detection processes are carried out by two different programs.

Fig. 7
figure 7

A snippet of codes to detect facial mask [11] (in the first approach)

For the second approach, we employ the same facial recognition program from Rosebrock, [10] to also detect face masks by generating an additional synthetic dataset before model training. This synthetic dataset is generated by adding a virtual face mask layer on every sample image of each individual as a new mask dataset for that individual. With the mask dataset for each individual, the trained model will also be able to recognize the owner’s face, even when wearing a face mask. Using face recognition, face mask detection can also be achieved.

These two methods are compared especially in the measurement of time required to produce both outputs, namely face recognition and face mask detection.

2.4 Online web scripting

2.4.1 Interface to capture input image through a browser

To allow input images to be captured from the camera through a browser, HTML and JavaScript and CSS have been coded as the front page of the system. This script will activate the locally attached camera every time the page is opened from the browser. Once the camera is activated, the user can view the live feed from the camera and when ready can press the ‘Take snapshot’ button to snap a picture. The user can first examine the picture taken and if it is acceptable, then the user can proceed to send this picture to the server for processing. Figure 8 shows the HTML and JavaScript written for the front page of this system.

Fig. 8
figure 8

HTML and JavaScript of the front page

2.4.2 PHP script for processing images and displaying output

A PHP script has been written to process pictures sent by users through browsers. First of all, this script will save the received picture file by remembering the name of the file. Then, this script will activate the Python program to run the face recognition process. The name of the received image file will be specified for the Python program as the image input to be processed. Figure 9 shows the PHP script written for this purpose.

Fig. 9
figure 9

PHP script to handle uploaded image

As soon as the called Python program finishes running, the PHP script will redirect the user to the next web page that will display the generated output from the Python’s face recognition process, along with the output for mask detection (see Fig. 10).

Fig. 10
figure 10

PHP script to display the output

2.5 Hardware requirement

The hardware required for this project is a computer used as a server to train the model and run the application server. Another computer is needed as a client device that needs to be equipped with a camera to take pictures for face detection. The last requirement is an internet connection to allow the user to access the server through a browser from the client terminal.

3 Results and discussion

This section will present the implementation of this project in the form of a server application, as well as the results obtained and an analysis of the results.

3.1 Main web Interface

Figure 11 shows the main web page of this system created based on HTML, JavaScript and PHP. Once opened, this page will automatically try to activate the camera on the client terminal. The user needs to click on the approval button for the use of the camera and then take selfies. This captured image will be used as an input image to be uploaded to the online attendance system server for facial recognition purposes, as well as face mask detection.

Fig. 11
figure 11

The main front-end webpage accessible from the server

Figure 12 shows an example in which the user has successfully taken their selfie and the image is displayed on this web page. To capture a selfie photo, the user just needs to press the ‘Take Snapshot’ button and the captured image will appear in the right image frame on this page. The user is allowed to re-take pictures if the captured image is not acceptable. If the captured image is acceptable, then the user can press the submit button to upload it to the server for the face recognition process.

Fig. 12
figure 12

Output result once the user has taken a selfie image through a webcam

3.2 Image processing on the server

Figure 13 shows a temporary image repository on the server showing selfie images that have been successfully uploaded to the server after the submit button is pressed. All selfie images are saved as PNG files.

Fig. 13
figure 13

All uploaded images are stored in one folder

On the server, a PHP script will handle the process after the selfie image is uploaded. This PHP script will activate the python program that will run the process of face recognition and facial mask detection. As for the output generated by the Python program, the same PHP file will also display the results that have been obtained to the user through the browser.

Figure 14 shows the output file, which was generated after the Python recognition script was run. Such a file will be displayed to the user to show successful face recognition. A bounding box will be drawn to map with the face area, along with an identification label and recognition accuracy value written on the bounding box.

Fig. 14
figure 14

The output file in the form of an image annotated with the recognized identity

Fig. 15
figure 15

The first method capturing user without the mask

3.3 Comparison between two approaches for face mask detection

At the beginning of this study, the face recognition and facial masks detection processes were implemented by two different Python scripts. The first Python script [10] will be activated by PHP to detect the user’s identity through the submitted facial image, while the second Python script [11] is activated specifically to detect whether the user is wearing a mask or not, also based on the same input image. However, the processing of these two scripts takes a long time on the server. Therefore, another approach has been proposed to achieve a more optimized processing time.

The second method is to rely on one Python script only; that is, the face recognition Python script based on the pre-trained model [10]. However, some improvements have been made to this method to allow it to also detect the use of face masks. One of the important improvements that has been made is to use synthetic data as a dataset that represents all user’s faces with the application of virtual face masks. This synthetic dataset is also used in the model training process, so that the generated pre-trained model can recognize not only identity, but also to recognize whether the individual is wearing the face mask or not.

Before this second approach was adopted, a few analyses were performed to compare the processing time taken by both of these two different methods. As we can see in Table 2, the approach taken for method 2 is much faster which takes about 5 to 6 seconds to produce the face recognition and face mask detection outputs. The approach taken in method 1 requires more time (around 20 to 24 seconds) to produce the same outputs. From here, we can conclude that method 2 is a more optimized option in terms of time usage. Although this method relies on a single Python script, it has successfully performed both face recognition and face mask detection with a shorter processing time.

Fig. 16
figure 16

The second method capturing user without the mask

Table 2 Comparison of processing time for both methods

3.4 Analysis of the similarity of facial recognition and face mask detection results

This section will demonstrate the analysis used to observe the effectiveness of this program in performing face recognition and detection of face masks. Figures. 19, 20, 21 show the output obtained based on several facial samples based on the second approach in face recognition and mask detection.

Fig. 17
figure 17

The first method capturing user with a mask

Fig. 18
figure 18

The second method capturing user with a mask

Fig. 19
figure 19

a The result of the face recognition system with 40.67% similarity of the first subject not wearing a mask b The result of the face recognition system with 48% similarity of the first subject wearing a mask

Fig. 20
figure 20

a. The result of the face recognition system with 29.44% similarity of the second subject not wearing a mask b. The result of the face recognition system with 32.50% similarity of the second subject wearing a mask

Fig. 21
figure 21

a. The result of the face recognition system with 33.07% similarity of the third subject not wearing a mask b. The result of the face recognition system with 15.38% similarity of the third subject wearing a mask

Fig. 22
figure 22

Attendance information is recorded in the MySQL database on the server

3.4.1 The equation for accuracy, sensitivity and precision [3]

To measure the accuracy of facial recognition, the accuracy equation is used as in [1].

$$ \boldsymbol{Accuracy}=\frac{TP+ TN}{\left( TP+ TN+ FP+ FN\right)} $$
(1)

To measure the sensitivity of facial recognition, the sensitivity equation is used as in [2].

$$ \boldsymbol{Sensitivi}\mathbf{t}\boldsymbol{y}=\frac{TP}{\left( TP+ FN\right)} $$
(2)

To measure the precision of facial recognition, the precision equation is used as in [3].

$$ \boldsymbol{Precision}=\frac{TP}{\left( TP+ FP\right)} $$
(3)

Where TP is TRUE POSITIVES, FN is FALSE NEGATIVES, FP is FALSE POSITIVES and finally, TN represents TRUE NEGATIVES.

3.4.2 Facial recognition accuracy analysis

From the 10 datasets of facial image samples collected, the accuracy of the face recognition process has been calculated as shown in Tables 3 and 4 using Eq. (1).

Table 3 Analysis of facial recognition accuracy
Table 4 Calculation on facial recognition analysis for Eq. (1) only

3.4.3 Mask detection accuracy analysis

From the 24 samples of synthetic datasets of face mask detection samples collected, the accuracy of the mask detection process has been calculated as shown in Tables 5 and 6, based on Eqs. (1), (2) and (3).

Table 5 Analysis of face mask detection accuracy
Table 6 Calculation on face mask detection analysis for Eq. (1), (2) and (3)

Based on the testing and analysis, the accuracy of the system is calculated at 81.8% for face recognition and 80% for face mask detection same as the value in sensitivity for face mask detection. While the value for precision for face mask detection is 100%, which is a very decent level of accuracy, based on the calculation that had been done earlier. More data and synthetic data may be required to feed the recognizer to improve the accuracy, sensitivity and precision so that the face and face mask detection can be performed with greater accuracy.

3.5 Recording attendance information into the database

The main purpose of this system is to record the presence or attendance of users online through a browser by recognizing the faces captured through a webcam. For this record-keeping, a database was built to store all the data captured. This includes the time a user submitted the picture, the identified user’s username and the status of whether the user is wearing a mask or not. Since this is an early prototype, the interface for this database was quickly developed (see Fig. 22).

4 Conclusion

This project has achieved its targeted objectives. First of all, a prototype system was successfully built to capture online attendance records, in which user identities are recognized based on facial biometrics. This system is a web-based application and so users can access the system interface from any browser regardless of terminal. Thus, the user is not burdened by the need to install special applications for this purpose.

In addition to the server interface script, the server application consists of a Python program used for the face recognition process, which will process each selfie image uploaded by the user to identify the user identity. This Python program requires a trained model for face recognition. One of the purposes of this system is to allow users who wear or do not wear masks to be recognized. Therefore, in the process of training the face recognition model, the sample data set used consisted of original images of the user’s face, as well as the synthetically generated images for the virtual face mask application. More than 200 user face data points were successfully obtained for testing and analysis purposes.

Some limitations of this system have been identified. For example, when the sample of user face data is insufficient, this will reduce the accuracy of face recognition and face mask detection process. Therefore, to get better accuracy, more samples of user face data are needed. Overall, the project successfully produced a system that could be used take attendance more easily and quickly, as well as monitor public safety by performing mask detection to stop the spread of the Covid-19 virus.