Real-Time Classroom Behavior Analysis for Enhanced Engineering Education: An AI-Assisted Approach

Hu, Jia; Huang, Zhenxi; Li, Jing; Xu, Lingfeng; Zou, Yuntao

doi:10.1007/s44196-024-00572-y

Real-Time Classroom Behavior Analysis for Enhanced Engineering Education: An AI-Assisted Approach

Research Article
Open access
Published: 27 June 2024

Volume 17, article number 167, (2024)
Cite this article

Download PDF

You have full access to this open access article

International Journal of Computational Intelligence Systems Aims and scope Submit manuscript

Real-Time Classroom Behavior Analysis for Enhanced Engineering Education: An AI-Assisted Approach

Download PDF

Jia Hu¹,
Zhenxi Huang¹,
Jing Li¹,
Lingfeng Xu¹ &
…
Yuntao Zou ORCID: orcid.org/0000-0002-8492-7684²

904 Accesses
Explore all metrics

Abstract

Modern teaching has made significant progress, with many advanced equipment and technologies being introduced into the teaching process. Experimental teaching of engineering design courses is important. Due to limited teaching resources, engineering students need effective guidance during limited laboratory time. We will introduce artificial intelligence solutions to engineering education. We will use artificial intelligence technology for classroom behavior analysis to improve engineering design practice courses' teaching effectiveness. In an instructional milieu, image acquisition tools such as cameras are capable of real-time data capture, facilitating the identification and enumeration of students' emotional states. Concurrently, analytical software gauges the students' interaction patterns and performs comprehensive cluster analysis. Such multifaceted information provides valuable insights into the students' educational engagement, allowing educators to tailor their approach, thereby fostering enhanced pedagogical outcomes. The emotion recognition model we have developed, namely ERAM, demonstrates a rapid response rate coupled with dependable accuracy, making it a robust tool for classroom implementation. In contrast to the conventional post-lesson evaluations, our proposed technique furnishes immediate feedback throughout the instructional process. This real-time approach heralds a significant shift in instructional methodology, promoting timely intervention and adaptive teaching strategies. The control group experiment showed that intelligent systems improved teaching effectiveness by 8.44%. Intelligent systems can help teachers understand students' learning status and improve laboratory teaching quality in engineering design courses.

Student Emotion Recognition Using Computer Vision as an Assistive Technology for Education

Affective Teacher Tools: Affective Class Report Card and Dashboard

Student behavior analysis to measure engagement levels in online learning environments

Article 14 May 2021

Discover the latest articles, news and stories from top researchers in related subjects.

Artificial Intelligence

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction

Smart education, also known as technology-enabled or digital education, refers to the integration of advanced technologies and digital tools to enhance teaching and learning processes [1]. It leverages the power of technology to create interactive and personalized learning experiences, improve educational outcomes, and facilitate more efficient and effective educational practices. Smart education encompasses various applications and tools that transform traditional educational approaches.

The application of smart education can be divided into two categories, one for learners and another for teachers and teaching managers. There are many applications for learners, such as personalized learning and interactive learning resources [2]. Personalized learning refers to smart education aimed to provide personalized learning experiences tailored to individual student needs and preferences. Adaptive learning systems use algorithms and data analytics to assess students' strengths, weaknesses, and learning styles, allowing for customized instruction and adaptive content delivery. Interactive learning resources refer to smart education that utilizes interactive digital resources, such as multimedia content, simulations, virtual reality (VR), and mixed reality (MR) [3], to engage students in active learning. These resources help visualize complex concepts, create immersive learning environments, and enhance students' understanding and retention of information. However, there are few applications that serve teachers and teaching managers.

Teachers and teaching management play a vital role in the education system as they are essential components in creating effective and meaningful learning experiences for students [4]. Effective teaching management is crucial for maintaining an organized and conducive learning environment. Teachers establish routines, manage time efficiently, and maintain discipline to minimize disruptions and maximize instructional time. A well-managed classroom enhances student engagement, collaboration, and focus, contributing to improved learning outcomes. Teachers assess students' learning progress through various methods. They provide feedback, identify areas for improvement, and offer guidance for further development [5]. Ongoing monitoring and evaluation of student performance allow teachers to make data-driven instructional decisions and adapt teaching strategies as needed.

However, the energy of teachers is limited, and it is impossible to pay attention to every student. During computer laboratory teaching, teachers need to manage classroom order, understand students' progress in assignments, and provide assistance to students in need, among other tasks. Only by understanding students' learning status can teachers effectively fulfill the tasks of computer laboratory teaching. This poses a challenge for teachers. This study, based on the scenario of experimental teaching of engineering design courses, provides teaching assistance services for teachers. We propose a solution to present the results of facial emotion recognition and the frequency of mouse and keyboard usage of students in the computer laboratory to teachers. Based on this, teachers can judge students' learning status, intervene in a timely manner, and address students’ learning difficulties.

1.1 Computer Laboratory Teaching

Computer laboratories in universities hold significant importance for engineering students. First, computer laboratories provide students with access to computers, software applications, and specialized equipment that they may not have access to outside the laboratory. This enables them to develop essential digital literacy skills and work on assignments, projects, and research that require computer-based resources. Second, computer laboratories offer a practical learning environment where students can apply theoretical concepts and principles they have learned in their classes. They can engage in hands-on activities, conduct experiments, and gain practical experience in using various software tools and technologies. Third, computer laboratories typically have technical support staff available to assist students with hardware or software issues. This support ensures that students can overcome technical challenges and focus on their learning goals without disruptions. Overall, computer laboratories play a crucial role in supporting teaching, learning, research, and technological advancement in universities. They provide students with the necessary resources and environment to develop their technical skills, apply knowledge, collaborate, and prepare for future careers in a technology-driven world.

However, computer laboratories in Chinese universities face challenges due to their specific circumstances.

Among these, large class sizes present the first significant hurdle, potentially affecting the individual attention each student can receive. Chinese universities often have a high student-to-teacher ratio, which can make it challenging for instructors to provide personalized attention and support to each student during computer laboratory sessions. As seen in Fig. 1, in the computer laboratory of a Chinese university, a teacher is responsible for managing and tutoring practical courses for 40 or more students. It becomes difficult to address individual questions and concerns effectively.

The second challenge stems from the diverse backgrounds of the students, which may require tailored teaching approaches to accommodate different learning needs. Chinese universities typically enroll students with varying levels of prior computer literacy and technical skills. In a computer laboratory setting, instructors may find it challenging to cater to the needs of students with different skill levels simultaneously, particularly in courses that have mixed-level participants.

Additionally, the third challenge is the limitation in available devices and the time allocated for hands-on practice, which is crucial for skill development in computer studies. Due to time constraints within the academic curriculum, it may be difficult to allocate enough time for students to engage in hands-on practice and experimentation in the computer laboratory. This can impact their ability to fully grasp and apply the concepts being taught.

It is important to note that these difficulties are not exclusive to Chinese universities and can be relevant in educational institutions worldwide. Universities continuously work to address these challenges by adopting innovative teaching methodologies to enhance the learning experience in computer laboratories.

1.2 Research Content

In the laboratory teaching of engineering courses, students are often required to complete certain tasks that involve programming, drawing, and drafting plans. During these experimental tasks, teachers need to pay attention to the students' status and answer their questions. These inquiries may pertain to professional knowledge or the operation of computers and software tools. Students from China and many other Asian countries tend to be less inclined to communicate with teachers and rarely take the initiative to ask questions. Additionally, the equipment in computer laboratories can obstruct the view, making it difficult for teachers to understand students' learning status by observing their behavior. We propose a solution that combines privacy-protected emotion recognition and operation recognition. This solution applies AI to analyze student behavior in the laboratory teaching of engineering design courses. The system aids in the illustrative analysis and assessment of students' in-class actions and their zeal for learning throughout the instruction period. This allows educators to make immediate modifications in response to the prevailing conditions, consequently augmenting the efficacy of their teaching.

The usage scenario for applying artificial intelligence to computer laboratory behavior analysis is illustrated in Fig. 2. The camera captures real-time images, which are then transmitted to the backend server. The AI model in the server utilizes facial emotion recognition to assess the learning status of each student and aggregates the results. Additionally, monitoring software installed on the student terminals tracks the frequency of keyboard and mouse operations, collecting statistics at regular intervals, such as every 5 min. These statistics are then sent to the teacher’s computer according to the designated time intervals, for example, every 5 min. By analyzing these statistics, the teacher can understand the current learning status of the students and make timely adjustments to enhance the quality of teaching. For instance, if some students show a significantly lower frequency of keyboard and mouse operations compared to others in the past 10 min, it indicates that they may be facing difficulties and unable to complete their assignments. The teacher can inquire about their doubts and provide explanations for common issues. Usually, we believe that when students are immersed in their learning, their faces may appear neutral or relaxed, indicating their mental engagement. Their facial muscles are not tense, and their expressions are free from signs of stress or distraction. When many students exhibit other emotions, it may indicate external disturbances, and their attention is not focused on learning. Teachers can understand the reasons behind this and maintain classroom order, ensuring that students concentrate on their studies. Teachers primarily focus on teaching during class and cannot constantly observe students’ dynamics. The implementation of a classroom behavior analysis model affords educators the ability to comprehend the learning states of their students, permitting immediate pedagogical adaptations. Consequently, this leads to an augmentation in teaching quality, promoting an enhanced educational experience.

Smart education, as the new trend in the field of education, has been widely implemented in university education globally. By introducing various technological means, smart education provides students with a more efficient and personalized learning environment. However, with the introduction of facial recognition, desktop monitoring, and other technologies, privacy issues concerning students have also emerged. Although these monitoring technologies can effectively gather information about students' learning status, they also significantly infringe upon students' privacy. To address this problem, we propose a privacy-protected emotion recognition model. This technology utilizes encryption algorithms to safeguard students' privacy information. Specifically, the technology applies deep encryption to the data, ensuring that the collection and analysis of students' learning status information avoid direct access to their personal privacy. The core of this technology lies in its ability to effectively acquire and utilize students' learning data while preserving privacy. With the help of this technology, educational institutions can better understand and enhance students' learning outcomes, while students can enjoy higher quality educational services while safeguarding their personal privacy.

2 Related Works

2.1 Application of AI in Education

Since the advent of artificial intelligence (AI) in the latter part of the twentieth century, it has progressively infiltrated the realm of education. The journey began with the deployment of rule-based expert systems in its infancy, which gave way to AI-infused data processing systems by the middle of the 1980s. Subsequently, in the mid-2000s, we observed the incorporation of machine learning techniques into AI, further expanding its potential.

The temporal trajectory of AI implementation in educational contexts reflects a marked uptick. As we advanced into the second decade of the twenty-first century, there was an observable surge in academic exploration concerning AI's relevance and applicability in pedagogical scenarios. The broad spectrum of these applications can be compartmentalized into two primary domains, which are as follows:

1.
Amplifying Pedagogical Efficacy and Learner Engagement

In this bracket, AI deployments chiefly encapsulate the accumulation and arrangement of pedagogical resources, smart search functionality, bespoke suggestions, and beyond. Contemporary developments are steered toward intelligent pedagogical automatons and augmented reality utilities that bolster the elucidation and display of theories and situations to students. These applications harness the power of three-dimensional technology and immersive simulations as instructional instruments, thereby supporting students in achieving a superior understanding of the imparted principles [6, 7]. The cardinal aim of this classification of AI applications is to captivate learners, escalate their curiosity, and cultivate a more profound learning encounter.

2.
Augmenting Pedagogical Administration and Evaluation

Artificial intelligence finds its application in a myriad of forms and roles within the sphere of education. As the pedagogical domain progresses, scholars are endeavoring to utilize sophisticated AI methodologies to tackle increasingly intricate predicaments, encompassing aid in scholastic administration and managerial duties. This equips educators to accomplish their administrative obligations, such as evaluation and delivering feedback to students, with enhanced efficacy [8,9,10]. The integration of AI can appreciably diminish teachers' bureaucratic chores and task load, particularly pertaining to diverse administrative roles, thus empowering them to focus on their primary responsibility of instruction [11].

The use of AI in educational administration is a vital facet of ongoing research. This manuscript specifically investigates classroom behavior analysis as a pedagogical evaluation approach, with the intention of aiding educators in augmenting the caliber of their instruction.

2.2 Computer Vision Technology Applications

Computer vision is currently a vibrant area of exploration in the landscape of AI. In the educational sphere, automated facial recognition (AFR) is recognized as a high-priority application. Within the educational context, Rafika [12] suggested the application of facial identification to monitor student participation. In addition, Roy [13] and Savchenko [14] introduced the concept of employing facial emotion detection to ascertain students’ engagement degrees during virtual learning, thus tackling the challenge of waning concentration due to limited human oversight in online courses.

Although the referenced studies have probed into the capabilities of image processing technology within educational environments, they have overlooked two significant considerations. The initial concern is related to data spillage. Considering that image processing technology is mainly employed with minor students in educational situations, any unintentional exposure of image data can lead to unfavorable outcomes. While facial identification technology has demonstrated advantages in the field of education, it has also sparked various issues, primarily in relation to data privacy. The collection and employment of facial imagery can give rise to negative consequences stemming from data spillage [15]. Andrejevic [16] points out that AFR technology provides solutions for campus safety, automated enrollment, and student emotion recognition, but the problem of data spillage continues to be a substantial source of apprehension.

To address these challenges, we propose an AI-based solution specifically designed for computer laboratory teaching. This solution circumvents the problem of data leakage while safeguarding student privacy. By implementing our proposed approach, educational institutions can harness the potential of computer vision technology in a secure and privacy-conscious manner.

3 Materials and Methods

In the realm of computational laboratory behavioral analysis, there is a significant demand for artificial intelligence-based solutions that uphold user privacy and offer rapid response times. To cater to these unique application prerequisites, we present a novel proposition, referred to as the intelligent engineering teaching system (IEDS). The specific components and functionalities of this solution are depicted in Fig. 3.

1.
The cameras capture computer laboratory image data.
2.
Images captured in the computer laboratory typically contain multiple students, which necessitate a preliminary step before direct analytical processing can occur. Specifically, individual faces within these images must be partitioned into separate images, thereby preparing them for subsequent AI model analysis. The dimensions of images post-semantic segmentation are variable, and to standardize these for model training, we resize all images to a uniform size of 224 × 224 × 3 pixels.
3.
The incorporation of a privacy protection module assures the provision of biometric privacy. Utilizing the Fourier transform, we effectively eliminate the low-frequency information within images, converting them into high-frequency representations. These transformed images, while unrecognizable to the human eye, do not interfere with the AI model's training efficacy.
4.
Analysis module: The analysis module consists of an emotion recognition model and a clustering model. The emotion recognition model identifies the processed images with privacy protection, namely Emotional Recognition with Attention Mechanism (ERAM), as shown in Fig. 4. The clustering model performs clustering calculations on the computer operation frequencies of all students to identify any abnormal operation patterns.
5.
Upon the conclusion of the individual image analysis post-segmentation, we count the number of images based on unique emotional features. Importantly, the outcomes from both the visual and cluster models are not linked to specific individuals. This ensures the safeguarding of student behavioral privacy, as their identities remain undisclosed.
6.
The server sends the statistical results to the teacher's working PC.

The technical principles and model structure are detailed in the Supplementary Information.

4 Experiment

4.1 Experimental Setup

The course of the experiments saw the model undergoing training on a cloud server, generously provisioned by the Kaggle platform. Post-training, these models were integrated into the laboratory server for the purpose of testing. A comprehensive overview of the experimental environment can be found in Table 1.

Table 1 Experimental environment

Full size table

4.2 Emotion Recognition Model Experiment

4.2.1 Dataset

Three publicly available datasets for emotion recognition, FERPlus, CK+, and ExpW, were used in this experiment.

FERPlus is an enhanced dataset purposed for facial expression recognition, provided by Microsoft Research. As an ameliorated version of the original Facial Expression Recognition (FER) dataset, FERPlus incorporates more precise labels and additional categories of facial expressions. The FERPlus dataset encompasses over 35,000 grayscale images divided into eight categories: Neutral, Happy, Sad, Surprise, Anger, Fear, Disgust, and Contempt. Compared to the original FER dataset, FERPlus has seen improvements, such as resolving labeling ambiguities through multi-person annotations and introducing an unlabeled category to address faces in images that do not express clear emotions.

The Extended Cohn–Kanade (CK+) dataset is a widely utilized resource in the field of facial expression recognition. It includes 593 sequences from 123 subjects, covering seven different facial expressions (Anger, Disgust, Fear, Happy, Sad, Surprise, and Neutral). These expressions typically commence from a neutral state, transitioning to a specific emotion. The final frame of each sequence is often labeled as the peak expression. A distinguishing feature of this dataset is its abundant individual variations, adding a layer of complexity to the facial expression recognition task.

The ExpW dataset is a large-scale collection curated for facial expression recognition. It comprises approximately 100,000 facial images captured in the wild, embracing seven fundamental facial expressions (Happy, Sad, Fear, Surprise, Anger, Disgust, and Neutral). A key advantage of ExpW is its coverage of a wide array of environmental conditions and individual variances, which bolsters the robustness of facial expression recognition systems. Moreover, relative to many other datasets, it boasts a larger scale, offering more data for training deep learning models.

Typically, images employed in computer vision research adhere to dimensions of 224 × 224 pixels, with pretrained models accessible from the Keras library commonly necessitating an input layer dimension of 224 × 224 × 3. To ensure uniformity and enable comparisons with alternate models, our model’s input layer dimensions have been configured to 224 × 224 × 3. Nevertheless, this configuration does not correspond with the image sizes provided by various datasets. For instance, the FERPlus dataset offers images of 48 × 48 pixels. To reconcile this incongruity, we deploy the Python Imaging Library (PIL) for pre-processing and resizing the emotion recognition dataset to be compatible with the 224 × 224 × 3 dimensions.

4.2.2 Model Training

The process of model training necessitates the application of a loss function, employed to gauge the discrepancy between the target's true and predicted values. To minimize the loss, we utilize an optimizer, more specifically the stochastic gradient descent (SGD) approach in this context. The majority of optimization algorithms harnessed in deep learning are attributable to this method. Two fundamental parameters influencing the progression of SGD training are the learning rate and batch size. Following an examination of the dataset and image dimensions, and consequent to numerous trials, the optimal training parameters are delineated in Table 2.

Table 2 Training parameters

Full size table

During network training, the learning rate (lr) governs the pace of parameter updates. A minuscule learning rate engenders a slower parameter update process, potentially trapping the model at local minima. Conversely, a substantial learning rate may induce oscillations during the search process, potentially hovering near local optimum values and obstructing network convergence.

In our training regimen, we espouse a learning rate decay strategy, commencing with a relatively high initial learning rate which subsequently diminishes as training advances. Our experiment implements a stepwise learning rate decay strategy, wherein the learning rate is modified every step size epochs. The adjustment factor, gamma, determines the new learning rate at each adjustment, calculated as lr * gamma. The training curve of ERAM is seen in Fig. 5.

4.3 Cluster Analysis Experiment

Our implementation of the k-means algorithm necessitates an a priori determination of k, which poses a particular challenge in the realm of k-means computation. The configuration of k is directly instrumental in determining the efficacy of the clustering algorithm. In the context of a computer laboratory teaching scenario, the students engaging in practical instruction in the laboratory ordinarily hail from the same class, and the practical syllabus remains consistent for all students. As such, these students are expected to exhibit substantial similarity in their keyboard and mouse operation frequencies during these practical sessions.

Nevertheless, there may exist a handful of students whose operational frequencies diverge from the majority, potentially due to course difficulties or personal factors. It is in this scenario that the utility of the k-means clustering algorithm comes to the fore—it assists in identifying these students with aberrant operational frequencies, thus enabling the addressal of their learning challenges.

5 Results

5.1 Performance of the Model

To evaluate the performance of our proposed emotion recognition attention model (ERAM), we drew comparisons with several existing emotion recognition architectures, utilizing the same datasets for testing.

The VGG16 [17] model, a convolutional neural network proposed by Simonyan and colleagues, showcased remarkable performance in the 2014 ImageNet Image Classification and Localization Challenge. Nevertheless, it must be noted that VGG16 is not a model specifically designed for emotion recognition tasks. Additionally, we referred to a range of contemporary research papers on emotion recognition to integrate other models for comparative analysis. The results of these comparative evaluations are encapsulated in Table 3.

Table 3 Comparison of multiple models

Full size table

The accuracy of ERAM model reached 87.64% in the FERPlus dataset, 94.46% in the CK+ dataset, and 82.38% in the ExpW dataset. By comparison, we can see that the ERAM model has the highest accuracy in the three datasets. In addition, in terms of response time, the ERAM model is the least and it is the fastest. Among the models participating in the comparison, the ERAM model has the highest accuracy, the fastest speed, and the best performance.

5.2 Ablation Experiments

In the ERAM model, we have introduced the CBAM. Conceptually, the CBAM module can be effortlessly integrated at any position within the CNN architecture. However, in practical terms, the positioning of the CBAM module within the model does influence the model's overall precision. To identify the optimal location for the CBAM module and thereby enhance the model's performance, we conducted a series of ablation studies, placing the CBAM module at various positions within the model. For a detailed account of these findings, refer to Table 4.

Table 4 Ablation experiments

Full size table

We use FERPlus dataset training to verify the effect of each group. Below is the training curve for each group.

Comparing the training results of the above models, as shown in Table 5.

Table 5 Comparison of the training effects of multiple models

Full size table

Model_1 placed the CBAM module at the front, resulting in an accuracy of only 82.64%. The training curve is shown in Fig. 6.

Model_2 placed the CBAM module after all convolutional layers, resulting in the worst performance with an accuracy of only 80.78%. The training curve is shown in Fig. 7.

Model_3 placed the CBAM module in the middle position of the model, achieving an accuracy of 84.69%. The training curve is shown in Fig. 8.

The best performance was observed in ERAM, where the CBAM module was placed after the first convolutional layer. It achieved the highest accuracy of 87.64%. The training curve is shown in Fig. 9.

Through ablation experiments, we believe that the ERAM model works best with the addition of CBAM module.

5.3 Cluster Analysis

The challenge of the K-means clustering algorithm lies in determining the value of k, which represents the number of clusters. In our experiment, we utilized the elbow method to determine the value of k. K-means aims to minimize the squared error between samples and centroids, with the sum of squared distances between the centroid and the samples within each cluster known as the distortion. For a cluster, a lower distortion indicates a tighter grouping of its members and a better clustering result, while a higher distortion implies a looser internal structure and a poorer clustering result. The distortion decreases as the value of k increases. For data with a certain degree of distinction, the distortion experiences significant improvement at a critical point, followed by a gradual decline. This critical point can be considered as a good clustering performance.

The value of K represents the number of categories or clusters. In a class, the number of students is typically around 40–60, resulting in a relatively small number of clusters. In our experiment, we considered the range of K values from 2 to 9. For each K value, we calculated the distortion. In this case, we utilized the cluster metric of the weighted average of mean centroid distances, which is expressed as follows:

$$\text{SSE}=\sum_{i-1}^{k}{\sum_{X\in {C}_{i}}\left|d(X,{C}_{i})\right|}^{2}.$$

From Fig. 10, it can be observed that the cluster metric experiences the fastest decline when K is set to 5. Therefore, the correct value for k should be 5.

Through testing with actual data collected from the computer laboratory, it is found that the clustering performance is best when k is set to 5. Each cluster exhibits distinct differences in keyboard and mouse operation frequencies. The cluster with the fewest members is considered the anomalous cluster, which includes the students that require attention.

6 Discussion

6.1 Technological Analysis

Experimental evaluations reveal that our proposed emotion recognition attention model (ERAM) augmented with the convolutional block attention module (CBAM) [23] surpasses other emotion recognition models in terms of both precision and response speed. Considering the model's intended deployment within the engineering design course's laboratory teaching scenario, a privacy module has also been incorporated. This privacy protection module exhibits certain distinctive features.

6.1.1 Performance of the ERAM Model

We proposed emotion recognition attention model (ERAM) augmented with the CBAM. CBAM is a highly effective attention mechanism used in deep learning. Attention mechanisms, in general, improve model performance by allowing the model to focus on the most important parts of the input for the task at hand. CBAM does this specifically for CNN, which are used primarily for image-related tasks. CBAM can significantly improve the performance of convolutional neural networks on a variety of image-related tasks and it is a lightweight, general-purpose module. CBAM can be seamlessly integrated into any CNN architecture with negligible additional overhead.

During the study, we found that the CBAM module placed after the first convolutional layer of the model has the best effect through the comparison of ablation experiments, and we determined the model structure of ERAM according to this (see the Appendix for details).

In comparisons based on several publicly available datasets, ERAM outperforms other emotion recognition models in terms of accuracy and response speed.

6.1.2 Privacy Safeguarding

Predominantly, human perception is biased toward low-frequency components of an image, while convolutional neural networks (CNNs) can discern both low- and high-frequency components. Essentially, every image or dataset encapsulates both semantic (texture or low-frequency) and high-frequency information, albeit in varying proportions. For datasets that follow the same distribution, their semantic and high-frequency distributions should possess distinct distribution characteristics.

To simplify, for a dataset bearing a consistent category label, collated across diverse scenes, the semantic distribution within each scene should be nearly homogeneous (given the same classification). However, the high-frequency distribution might not be explicitly tied to a particular domain, and it could encompass category-specific information. It might also contain noise extraneous to the distribution, which could be deleterious to model training and impact its generalizability. As humans are relatively insensitive to high-frequency components, they predominantly rely on semantics for annotation, often overlooking high-frequency information.

During the initial phase of CNN training, the model primarily leverages semantic or low-frequency information. As the training loss begins to stagnate, additional high-frequency components are incorporated to further reduce the loss. Therefore, the direct use of high-frequency images does not necessarily negatively affect accuracy.

We implement the Fourier transform for privacy preservation subsequent to image segmentation. Experimental findings corroborate that the processed images, unrecognizable to the human eye, do not obstruct the model training and recognition procedure.

Before initiating the emotion recognition task, we applied privacy-preserving measures to the images, preserving only the high-frequency information. We executed three sets of comparative experiments, using privacy-preserving processed images versus raw images, with the results delineated in Table 6.

Table 6 Comparison of the accuracy of privacy protection

Full size table

Experimental data substantiate that while privacy preservation eradicates the low-frequency information within images, it does not detrimentally affect classification recognition. Notably, the difference in model precision is less than 1%.

6.2 Analysis of Teaching Applications

To gauge the real-world efficacy of the IEDS, we enlisted the participation of four instructors and 413 students. The student population is distributed in eight different classes, all in the same year and major. There are about 50 students in each class, including 277 male students (67.1%) and 136 female students (32.9%), as shown in Fig. 11.

Half of these classes, during the engineering design course, were instructed with the utilization of the IEDS system, whereas the remaining classes were conducted traditionally. This experimental arrangement spanned 1 month. Subsequently, all participating trainees were evaluated at the culmination of the engineering design course. We utilized examination scores as a benchmark for pedagogical effectiveness. We chose four teachers to participate in the control group teaching experiment; the associated distribution of educators and students is detailed in Table 7.

Table 7 The corresponding distribution of teachers and students

Full size table

The primary users of the IEDS system are instructors, and to account for the effect of diverse teaching methodologies, we solicited teachers varying in both teaching experience and gender, as delineated in Table 8.

Table 8 Teacher's personal information

Full size table

Upon the conclusion of the engineering design course, we administered examinations to all students across the eight classes, with the results shown in Table 9.

Table 9 Engineering design course examination scores

Full size table

In terms of test results, the classes that used IEDS system in teaching had an average score of 82.10 points. The average score of the classes that did not use IEDS system was 75.71 points. Classes that used IEDS system scored 8.44% higher on examinations than classes that did not. The test results reflect the teaching effect, indicating that the teaching effect increased by 8.44% after using the IEDS system.

The users of the IEDS system are teachers, and Table 10 shows the statistics for teachers.

Table 10 Statistics for teachers

Full size table

As seen in Table 10, every teacher has improved the teaching effect after using the IEDS system in teaching. Teacher A improved by 5.73%, Teacher B by 4.57%, Teacher C by 9.07%, and Teacher D by 14.24%. Teacher A and Teacher B are rookie teachers, with an average teaching experience of 1.25 years and an average improvement of 5.15%. Teacher C and Teacher D are veteran teachers, with an average teaching experience of 9.25 years, and an average improvement of 11.65% in teaching effectiveness.

Our initial hypothesis was that novice teachers, given their relative youth, would be more open to integrating new technologies into their teaching methodology. However, the actual enhancement in teaching efficacy was relatively modest, averaging only an increase of 5.15% among younger teachers compared to an average improvement of 11.65% for their veteran counterparts. These results suggest that seasoned teachers derived greater benefit from the IEDS system. We surmise that novice teachers, given their relative lack of experience, tend to concentrate their efforts primarily on the act of teaching itself, leaving little bandwidth to scrutinize the information rendered by the IEDS system.

Operating as an adjunct to manual observations, the IEDS system furnishes statistical data about students' emotional states and operational frequency. This allows teachers to tailor their instructional strategies based on student responses, thereby enhancing overall teaching effectiveness.

The reasons why experienced teachers show better results in accepting and applying new technology can be scientifically analyzed and explained from multiple dimensions. First, these teachers possess extensive teaching experience, enabling them to deeply understand the practical application value of new technologies in education. According to the Dreyfus model of skill acquisition [24], individuals who have reached the “expert” stage can make decisions based on intuition and deep understanding. Therefore, experienced teachers can more accurately select and use technological tools that are most beneficial for teaching.

Moreover, experienced teachers have usually developed a mature set of teaching methods, allowing them to integrate new technologies into the classroom more effectively. The Technological Pedagogical Content Knowledge (TPACK) [25] framework emphasizes the importance of integrating content knowledge, pedagogical knowledge, and technological knowledge to achieve effective teaching. The maturity of experienced teachers in these three areas makes them better at incorporating new technology into their existing teaching systems, rather than merely pursuing the novelty of technology.

From the perspective of career development motivation, experienced teachers may view learning and applying new technologies as part of their professional growth to maintain the modernity and relevance of their teaching. Lifelong learning theory emphasizes the need for continuous education to adapt to a rapidly changing world, while career self-efficacy theory reveals how successful experiences can enhance an individual's belief in success, thus inspiring experienced teachers to explore new technologies more actively.

Furthermore, experienced teachers usually have reached a certain level of stability in their careers, possessing more confidence and resources to try and implement new teaching methods and technologies without overly worrying about the consequences of failure. Both self-determination theory (SDT) [26] and achievement motivation theory (AMT) [27] support this, with the former emphasizing the fulfillment of psychological needs for autonomy, competence, and relatedness, and the latter focusing on the individual's motivation to achieve and control.

Supported by these theories, we can see that experienced teachers perform better in accepting and applying new technologies not only due to their technical proficiency, but also because of their unique advantages in teaching practice, career motivation, and psychological state. This phenomenon is also evident in other industries such as healthcare, finance, and manufacturing, illustrating the significant impact of professional experience and a positive attitude toward career development on technology acceptance and application. Therefore, for any industry, understanding the specific needs and backgrounds of users and designing training and support strategies that meet these needs are key to ensuring the successful promotion and application of technology.

6.3 Deficiency and Improvement Direction

The IEDS system furnishes pertinent information to educators, thereby enabling them to leverage their expertise to fine-tune their teaching methodologies, consequently augmenting instructional efficacy. While this system proves beneficial for seasoned teachers, it presents certain limitations in real-world educational settings.

First, instructors who are relatively new to the profession might struggle to draw meaningful conclusions or interpret the data provided by the system. As such, future investigative efforts should strive to enhance the system's ability to delineate the association between recognition results and instructional efficiency. By comprehending the distinctive characteristics of individual educators, specific course content, and varied student clusters, the system could potentially offer bespoke pedagogical suggestions, thereby augmenting the overall effectiveness of teaching.

Second, the effectiveness of cluster analysis sometimes deviates. The purpose of conducting clustering analysis on operation frequencies is to identify students with abnormal states and address their learning difficulties. In practical teaching applications, the clustering effectiveness may be compromised due to the limited signals captured, as only keyboard and mouse operation frequencies are monitored. In future research, capturing keystroke information can be considered to enhance the input information for clustering analysis, enabling more precise and accurate clustering analysis.

7 Conclusions

In this manuscript, we propose the IEDS system for the behavioral analysis of students enrolled in an engineering design curriculum. This solution, catering to contemporary teaching approaches, assists educators in facilitating engineering education, thereby enhancing the quality of instruction. Our method involves the analysis of students’ learning states via emotion recognition and frequency analysis of operations, demonstrating both high accuracy and efficiency. We amass data for analysis and render them in real time, thereby enabling teachers to optimize the effectiveness of their engineering course instruction.

The potential of artificial intelligence in the realm of education warrants further exploration. We posit that future research directions should focus on the following aspects:

1.
Privacy Protection: While safeguarding privacy, it remains critical to effectively gather and utilize student learning data. With the deployment of such technology, educational institutions can gain superior insights into and improve their students' learning outcomes, while students benefit from enhanced educational services.
2.
Reduction of Learning Costs: Presently, the effectiveness of the IEDS system largely depends on the educators' experience. Future research should incorporate expert systems, enabling novice teachers to utilize them with ease and foster engineering education.

Data Availability

The FER-2013 dataset can be downloaded from https://www.kaggle.com/c/challenges-in-representation-learning-facial-expression-recognition-challenge/data (accessed on 11 May 2023). The CK+ dataset can be downloaded from https://www.kaggle.com/datasets/davilsena/ckdataset (accessed on 11 May 2023). The ExpW dataset can be downloaded from https://www.kaggle.com/datasets/mohammedaaltaha/expwds (accessed on 11 May 2023).

References

Hoel, T., Mason, J.: Standards for smart education – towards a development framework. Smart Learn. Environ. 5, 3 (2018). https://doi.org/10.1186/s40561-018-0052-3
Article Google Scholar
Martín, A.C., Alario-Hoyos, C., Kloos, C.D.: Smart education: a review and future research directions. Proceedings 31, 57 (2019). https://doi.org/10.3390/proceedings2019031057
Article Google Scholar
Li, L., Chen, C.P., Wang, L., Liang, K., Bao, W.: Exploring artificial intelligence in smart education: real-time classroom behavior analysis with embedded devices. Sustainability 15, 7940 (2023). https://doi.org/10.3390/su15107940
Article Google Scholar
Cameron, K.S., Whetten, D.A.: A model for teaching management skills. Exchange Org. Behav. Teach. J. 8(2), 10–15 (1983)
Google Scholar
Whetten, D.A., Clark, S.C.: An integrated model for teaching management skills. J. Manag. Educ. 20(2), 152–181 (1996). https://doi.org/10.1177/105256299602000202
Article Google Scholar
Timms, M.J.: Letting artificial intelligence in education out of the box: educational cobots and smart classrooms. Int. J. Artif. Intell. Educ. 26, 701–712 (2016)
Article Google Scholar
Mikropoulos, T.A., Natsis, A.: Educational virtual environments: a ten-year review of empirical research (1999–2009). Comput. Educ. 56, 769–780 (2011). https://doi.org/10.1016/j.compedu.2010.10.020
Article Google Scholar
Rus, V., D’mello, S., Hu, X., Graesser, A.: Recent advances in conversational intelligent tutoring systems. AI Mag. 34, 42–54 (2013). https://doi.org/10.1609/aimag.v34i3.2485
Article Google Scholar
Sharma, R.C., Kawachi, P., Bozkurt, A.: The landscape of artificial intelligence in open. Online Dist. Educ. Promises Concerns (2019). https://doi.org/10.5281/zenodo.3730631
Pokrivcakova, S.: Preparing teachers for the application of AI-powered technologies in foreign language education. J. Lang. Cult. Educ. 7, 135–153 (2019). https://doi.org/10.2478/jolace-2019-0025
Article Google Scholar
Chassignol, M., Khoroshavin, A., Klimova, A., Bilyatdinova, A.: Artificial Intelligence trends in education: a narrative overview. Procedia Comput. Sci. 136, 16–24 (2018). https://doi.org/10.1016/j.procs.2018.08.233
Article Google Scholar
Rafika, A.S., Sudaryono, Hardini, M., Ardianto, A.Y., Supriyanti, D.: Face recognition based artificial intelligence with AttendX technology for student attendance (2022). https://doi.org/10.1109/icostech54296.2022.9829122
Roy, M.L., Malathi, D., Jayaseeli, J.D.D.: Facial recognition techniques and their applicability to student concentration assessment: a survey, pp. 213–225 (2022). https://doi.org/10.1007/978-981-16-5652-1_18
Savchenko, A.V., Savchenko, L.V., Makarov, I.: Classifying emotions and engagement in online learning based on a single facial expression recognition neural network. IEEE Trans. Affect. Comput. 13, 2132–2143 (2022). https://doi.org/10.1109/taffc.2022.3188390
Article Google Scholar
Bu, Q.: The global governance on automated facial recognition (AFR): ethical and legal opportunities and privacy challenges. Int. Cybersecur. Law Rev. 2, 113–145 (2021)
Article Google Scholar
Andrejevic, M., Selwyn, N.: Facial recognition technology in schools: critical questions and concerns. Learn. Media Technol. 45, 115–128 (2020)
Article Google Scholar
Simonyan K., Zisserman, A.: Very deep convolutional networks for large-scale image recognition. https://arxiv.org/abs/1409.1556. Accessed 11 May 2023.
Houshmand, B., Mefraz Khan, N.: Facial expression recognition under partial occlusion from virtual reality headsets based on transfer learning. In: Proceedings of IEEE 6th Interenational Conference on Multimedia Big Data (BigMM), pp. 70–75 (2020)
Zhao, G., Yang, H., Yu, M.: Expression recognition method based on a lightweight convolutional neural network. IEEE Access 8, 38528–38537 (2020)
Article Google Scholar
Saabni, R., Schclar, A.: Facial expression recognition using combined pre-trained convnets. Comput. Sci. Inf. Technol. 95, 95–106 (2020)
Google Scholar
Lian, Z., Li, Y., Tao, J., Huang, J., Niu, M.: Region based robust facial expression analysis. In: Proceedings of 1st Asian Conference on Affective Computing and Intelligent Interaction, pp. 1–5 (2018).
Li, M., Xu, H., Huang, X., Song, Z., Li, X., Li, X.: Facial expression recognition with identity and emotion joint learning. IEEE Trans. Affect. Comput. (2018). https://doi.org/10.1109/TAFFC.2018.2880201
Article Google Scholar
Woo, S., Park, J., Lee, J.Y., et al:. Cbam: convolutional block attention module. In: Proceedings of the European Conference on Computer Vision (ECCV), pp. 3–19 (2018).
Dreyfus, S.E., Dreyfus, H.L.: A Five-Stage Model of the Mental Activities Involved in Directed Skill Acquisition. University of California, Berkeley (1980)
Book Google Scholar
Mishra, P., Koehler, M.J.: Technological pedagogical content knowledge: a framework for teacher knowledge. Teach. Coll. Rec. 108(6), 1017–1054 (2006)
Article Google Scholar
Deci, E.L., Ryan, R.M.: Intrinsic Motivation and Self-determination in Human Behavior. Springer Science & Business Media, Berlin (2013)
Google Scholar
McClelland, D.: Achievement Motivation Theory. Organizational Behavior, vol. 1, pp. 46–60. Routledge, London (2015)
Google Scholar

Download references

Funding

This research received no external funding.

Author information

Authors and Affiliations

State Grid Hubei Electric Power Co. Ltd, Wuhan, 430000, China
Jia Hu, Zhenxi Huang, Jing Li & Lingfeng Xu
School of Energy and Power Engineering, Huazhong University of Science and Technology, Wuhan, 430074, China
Yuntao Zou

Authors

Jia Hu
View author publications
You can also search for this author in PubMed Google Scholar
Zhenxi Huang
View author publications
You can also search for this author in PubMed Google Scholar
Jing Li
View author publications
You can also search for this author in PubMed Google Scholar
Lingfeng Xu
View author publications
You can also search for this author in PubMed Google Scholar
Yuntao Zou
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

Conceptualization: J. Hu and Z. Huang; methodology: Z. Huang; software: L. Xu; validation: J. Li; formal analysis: Z. Huang and Y. Zou; investigation: Y. Zou; resources: J. Hu; data curation: L. Xu; writing—original draft preparation: J. Hu. All authors have read and agreed to the published version of the manuscript.

Corresponding author

Correspondence to Yuntao Zou.

Ethics declarations

Conflict of Interest

The authors declare no conflict of interest.

Informed Consent

Informed consent was obtained from all subjects involved in the study.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Electronic supplementary material

Below is the link to the electronic supplementary material.

Supplementary file1 (DOCX 215 kb)

Appendix

1.1 Emotional Recognition Model

This study proposes a high-performance vision model that can respond quickly and has high accuracy. We named the model as ERAM (emotional recognition with attention mechanism). The structure of the ERAM model is shown in Fig. 4.

1.
Input

Within the privacy protection module, harvested images are segmented into dimensions of 224 × 224 × 3 based on facial criteria. Numerous public sentiment recognition datasets conform to this size. We adapt our input to the 224 × 224 × 3 dimensions, thereby facilitating the incorporation of a public dataset in the training process.
2.
Block 1

Convolution forms the most prevalent operation in the realm of neural networks. Block 1 constitutes a convolutional layer that is designed to extract distinctive features from the input data.
3.
Block 2

Block 2 is a pooling layer, designed to condense the spatial dimensions of the feature map from the prior layer while preserving crucial information. We employ max-pooling, which captures the maximum value from the feature points in the vicinity. Max-pooling proficiently retains texture features, reduces data dimensionality, amplifies computational efficiency, augments model robustness, and mitigates overfitting.
4.
Block 3

The CBAM is a versatile, low-footprint module that can be seamlessly embedded into any neural network architecture with minimal additional computational load. Typically, CBAM parameters number around 0.1 million, allowing it to elevate the precision of lightweight models without increasing the model's size. CBAM retains input dimensions and can be smoothly integrated into the network model. Within the ERAM, CBAM is appended following the convolutional layer to enhance the model's accuracy.
5.
Block 4–Block 9

Blocks 4 through 9 constitute three pairings of convolutional layers and max-pooling layers, mirroring the effects of Block 1 and Block 2, albeit with variant filter sizes.
6.
Block 10

Block 10 is a fully connected layer that efficiently translates the two-dimensional feature map produced by the convolutional output into a one-dimensional vector. The output generated by this fully connected layer comprises a collection of finely curated features that are subsequently channeled into the final classification or regression process.
7.
Block 11 (Output Layer)

The output layer utilizes the softmax function. In scenarios featuring multi-classification tasks, the softmax function acts as the activation function for the neural network output layer, empowering the model to generate probability values that correspond to potential category affiliations of the sample.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Hu, J., Huang, Z., Li, J. et al. Real-Time Classroom Behavior Analysis for Enhanced Engineering Education: An AI-Assisted Approach. Int J Comput Intell Syst 17, 167 (2024). https://doi.org/10.1007/s44196-024-00572-y

Download citation

Received: 06 July 2023
Accepted: 16 June 2024
Published: 27 June 2024
DOI: https://doi.org/10.1007/s44196-024-00572-y

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Real-Time Classroom Behavior Analysis for Enhanced Engineering Education: An AI-Assisted Approach

Abstract

Similar content being viewed by others

Student Emotion Recognition Using Computer Vision as an Assistive Technology for Education

Affective Teacher Tools: Affective Class Report Card and Dashboard

Student behavior analysis to measure engagement levels in online learning environments

Explore related subjects

1 Introduction

1.1 Computer Laboratory Teaching

1.2 Research Content

2 Related Works

2.1 Application of AI in Education

2.2 Computer Vision Technology Applications

3 Materials and Methods

4 Experiment

4.1 Experimental Setup

4.2 Emotion Recognition Model Experiment

4.2.1 Dataset

4.2.2 Model Training

4.3 Cluster Analysis Experiment

5 Results

5.1 Performance of the Model

5.2 Ablation Experiments

5.3 Cluster Analysis

6 Discussion

6.1 Technological Analysis

6.1.1 Performance of the ERAM Model

6.1.2 Privacy Safeguarding

6.2 Analysis of Teaching Applications

6.3 Deficiency and Improvement Direction

7 Conclusions

Data Availability

References

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflict of Interest

Informed Consent

Additional information

Publisher's Note

Electronic supplementary material

Supplementary file1 (DOCX 215 kb)

Appendix

Appendix

1.1 Emotional Recognition Model

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation