Facial emotion recognition based real-time learner engagement detection system in online learning context using deep learning models

Gupta, Swadha; Kumar, Parteek; Tekchandani, Raj Kumar

doi:10.1007/s11042-022-13558-9

Facial emotion recognition based real-time learner engagement detection system in online learning context using deep learning models

1226: Deep-Patterns Emotion Recognition in the Wild
Published: 09 September 2022

Volume 82, pages 11365–11394, (2023)
Cite this article

Download PDF

Multimedia Tools and Applications Aims and scope Submit manuscript

Facial emotion recognition based real-time learner engagement detection system in online learning context using deep learning models

Download PDF

11k Accesses
30 Citations
2 Altmetric
Explore all metrics

Abstract

The dramatic impact of the COVID-19 pandemic has resulted in the closure of physical classrooms and teaching methods being shifted to the online medium.To make the online learning environment more interactive, just like traditional offline classrooms, it is essential to ensure the proper engagement of students during online learning sessions.This paper proposes a deep learning-based approach using facial emotions to detect the real-time engagement of online learners. This is done by analysing the students’ facial expressions to classify their emotions throughout the online learning session. The facial emotion recognition information is used to calculate the engagement index (EI) to predict two engagement states “Engaged” and “Disengaged”. Different deep learning models such as Inception-V3, VGG19 and ResNet-50 are evaluated and compared to get the best predictive classification model for real-time engagement detection. Varied benchmarked datasets such as FER-2013, CK+ and RAF-DB are used to gauge the overall performance and accuracy of the proposed system. Experimental results showed that the proposed system achieves an accuracy of 89.11%, 90.14% and 92.32% for Inception-V3, VGG19 and ResNet-50, respectively, on benchmarked datasets and our own created dataset. ResNet-50 outperforms all others with an accuracy of 92.3% for facial emotions classification in real-time learning scenarios.

Facial emotion recognition using convolutional neural networks (FERC)

Article 18 February 2020

Real-time facial emotion recognition system among children with autism based on deep learning and IoT

Article Open access 07 March 2023

Identifying emotions from facial expressions using a deep convolutional neural network-based approach

Article 22 July 2023

1 Introduction

The digital learning platform has created an affordable learning opportunity for the masses worldwide. It has enhanced the learning process by making educational resources accessible and readily available. After the COVID-19 outbreak, the teaching-learning scenario has dramatically been shifted to digital platforms [40]. With this sudden shift from the physical to virtual classroom around the globe, many challenges have arisen [27]. One such limitation is the lack of physical presence. Due to the absence of face-to-face interaction with the instructor, learners lose motivation and interest, which affects their learning performance [4]. As a result, learners do not complete the online course or leave the online classroom mid-way. So, it becomes important to know how well a learner is engaged in the online learning environment. Therefore, researchers are now working hard to meet the latest challenges faced in online learning [3].

The degree of engagement level is generally reflected by analysing the emotional involvement of learners while studying [56]. Emotions always affect the learning performance of the learners [45]. For instance, positive emotions like joy and curiosity have a positive impact as they facilitate self-regulation and help to focus more on the problem-solving tasks and thus, make learners engaged [26]. On the other hand, negative emotions like boredom and frustration divert the attention of the learners and consume cognitive resources during learning activities and leave the learners disengaged [54]. Thus, the present study focuses on the automatic recognition of facial emotions of online learners. Recognising learners’ facial emotions in real-time is quite challenging in an e-learning platform owning to the absence of human supervision. For such a learning environment, learners’ facial expressions are analysed in real-time to extract different emotions. Facial expressions are physical muscle movements translated from emotional impulses such as raising eyebrows, wrinkling the forehead, or curling lips. By observing the change of facial expressions automatically, a lot of information about online learner’s emotional states can be determined [60]. For instance, see the images of Fig. 1 and estimate their state of mind by observing their expressions.

The anticipatory state can be recognised in Fig. 1(a) as a learner is engrossed in studying. The learners in Fig. 1(b) are sleeping while studying with the laptop on. The learner in Fig. 1(c) is in a happy state while studying. In Fig. 1(d), the learner is peacefully studying. The annoyance state of the learner can be seen in Fig. 1(e) because the learner does not understand the concept. The state of frustration can be seen in Fig. 1(f) as a learner is not able to focus on his studies.

Facial expressions are divided into six basic emotions by the famous psychologist Paul Ekam [12]. Since then, many researchers have universally accepted these basic emotions for FER (Facial expression recognition) research. These six basic emotions are surprise, sadness, happiness, fear, disgust and anger, which are shown in Fig. 2. Therefore, automatically analysing the facial expressions of learners helps in deciding the learner’s engagement state in real-time scenarios.

Considering the need to develop a real-time engagement detection system, this paper proposes a novel approach based on deep learning technologies. Our contributions to this paper are as follows.

An engagement detection system that automatically detects learner engagement in real-time scenarios based on facial emotion recognition is proposed.
Online learner’s engagement is evaluated based on the facial emotion information through real-time facial expression analysis.
Face detection is done with the help of the pre-trained Faster R-CNN model.
A modified landmark extractor is proposed, i.e. MFACEXTOR, to extract the 470 key points (face-points).
Deep learning models such as Inception-V3, VGG19 and ResNet-50 are implemented for real-time learning scenarios to classify student emotions such as angry, sad, happy, neutral etc., with the help of the softmax function.
An Engagement evaluation algorithm is proposed to calculate the engagement index from facial emotion classification output data.
Finally, based on the engagement index value, the system decides whether the online learner is engaged or disengaged.

The rest of the paper is organised as follows. Section 2 gives a brief overview of the related work done for engagement detection in an online learning environment. Section 3 discusses the datasets used for the experiment. In Section 4, the methods and proposed system are discussed. Section 5 presents the experimental results achieved after implementing our proposed approach. Section 6 illustrates the comparison of the proposed system with the state-of-the-art models. The visualisation of the engagement detection system is presented in Section 7. Section 8 concludes the paper with the future work.

2 Related work

In recent years, engagement detection during online learning has been gaining attention. Engagement detection is essential in the online classroom environment to engage learners and enhance learning performance, but the studies about it have only been exclusive to the traditional classroom [13, 35]. Many studies reported that faculties who are teaching through virtual mediums believe that they can access learner’s understanding better in a face-to-face classroom environment than in an online learning environment [7, 22]. Researchers recently started investigating the impact of monitoring learner’s engagement during online content delivery [15, 21]. Various approaches have been implemented for engagement detection, but facial expression is one of the popular and successful methods [6, 44] because it is the visual clue to recognise the emotional state of the learner [36] and face images dataset is easier to collect [38]. Zhang et al. [59] has proposed a multi-task cascaded CNNs based framework. This proposed model used three stages cascaded structure to boost face detection performance.

Most of the recent work on FER (facial expression recognition) performed well on the controlled images dataset but didn’t perform well on partial and variation face images. Using facial features, [37] proposed a conceptual framework for the classification of learner engagement. The proposed model included detecting the multiple faces, extracting the facial units, and using SVM (Support vector machine) as a binary classifier model. The proposed model was tested on various datasets, and comparisons with the best-configured models were made. The automated learning system can act as an important tool to identify online learners’ attentive and inattentive states. Deep learning networks have been successful in implementing engagement detection using facial expressions [8, 43, 50]. Turabzadeh et al. [55] have researched real-time facial emotion recognition using LBP (Local Binary Point) algorithm. The features were extracted using LBP from the captured video and input into the K-NN (K- Nearest Neighbour) regression with dimension labels. The accuracy recorded using the LBP algorithm was 51.28%. Murshed et al. [46] explored three models, namely network-in-network (NiN-CNN), convolutional network (All-CNN), and very deep convolutional network (VD-CNN) [51]. The advantageous features from these models were extracted, and an improved model was proposed. In the proposed model, a multi-layer perceptron replaces the linear convolutional layer, a further convolutional layer replaces a few max-pooling layers and finally, small (3x3) convolutional filters replace the depth of the network. The engagement detection performance was measured, and the proposed model was compared with three base models performed on the DAiSEE dataset in e-Environments. Li et al. [31] proposed real-time facial emotions recognition system for learners using Xception model. For face detection, Haar-Cascade was used along with the Xception model [19]. The deep separable convolution method was used with pre-activation in the residual block, which reduced the complexity of training the model and reduced the overfitting complication. Altuwairqi et al. [5] proposed a multimodal approach to measure learner’s engagement. In this study, three modalities were analysed to represent learner behaviour. Several experiments were conducted to validate the proposed approach, and an accuracy rate of 76.19% was recorded.

Reliable models play a key role in detecting learner engagement during educational activities. Li et al. [30] proposed a multi-kernel convolution block for facial expression recognition. The multi-kernel convolution approach for feature extraction used three depth-wise separable convolutions. The multiple size convolution kernels and fuse details are simultaneously obtained along with edge contour details on facial expressions to design the lightweight facial expression network. The experimental results showed an accuracy of 73.3% on FER-2013 and CK+ using the proposed approach. Minaee et al. [39] convolution network with less than 10 CNN layers for emotion detection. The proposed network has been evaluated on JAFFE, CK+, FER-2013 and FERG. The visualisation technique has been used to find important regions of the face for recognising the facial expression of different emotions. It has been concluded from the previous studies that facial expressions play an important role in the emotion recognition process. From prior studies, image-level facial expression recognition is more focused rather than in the real-time environment. There is a lack of work that has been done on detecting engagement using facial expressions. Also, the existing studies for engagement detection were not implemented for an online educational environment. Therefore, the current studies provide a vision to be followed for qualitative and quantitative research for real-time engagement detection. The purpose of the present study is to use deep learning models to automatically detect the engagement states of online learners using facial expressions.

3 Datasets

In this study, the proposed model considered those standard benchmarked and publicly available datasets that work efficiently in real-time scenarios. A brief overview of these datasets is given below.

3.1 Wider face

Wider Face dataset [58] is one of the largest datasets for face detection having 32,203 coloured images and 393,703 labelled faces. It is a subset of the publicly available Wider dataset. The summary of the Wider Face dataset is given in Table 1 and its sample images are shown in Fig. 3(a).

Table 1 Summary of WIDER face dataset

Facial emotion recognition based real-time learner engagement detection system in online learning context using deep learning models

Abstract

Similar content being viewed by others

Facial emotion recognition using convolutional neural networks (FERC)

Real-time facial emotion recognition system among children with autism based on deep learning and IoT

Identifying emotions from facial expressions using a deep convolutional neural network-based approach

1 Introduction

2 Related work

3 Datasets

3.1 Wider face

3.2 FER-2013 (Facial expression recognition 2013)

3.3 CK+ (extended cohn-kanade dataset)

3.4 RAF-DB (real-world affective faces)

3.5 Own dataset

4 Proposed approach for engagement detection system

4.1 Automatic frame selection

4.1.1 Face detection based on faster R-CNN (Region-based convolutional neural network)

4.1.2 Modified face-points extractor

4.2 Facial emotion recognition using deep learning models

4.2.1 Inception-V3

4.2.2 VGG19

4.2.3 ResNet-50

4.3 Engagement detection system

5 Experimental results and analysis

5.1 Evaluation metrics

5.2 Result analysis for FER-2013 dataset

5.3 Result analysis for CK+ dataset

5.4 Result analysis for RAF-DB dataset

5.5 Result of own test data

6 Visualisation of the engagement detection system using facial emotion recognition

7 Comparison with existing systems

8 Conclusion and future scope

References

Author information

Authors and Affiliations

Corresponding author

Ethics declarations

Conflict of Interests

Additional information

Publisher’s note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation