Introduction

Online education has emerged as a form of education that has developed based on delivery via an Internet platform. It has become an integral part of the education system in an information-based society, mainly in the form of auxiliary means to supplement traditional education. In 2020, traditional education was severely impacted by COVID-19. At the same time, online education is rapidly gaining popularity and becoming a primary delivery system for coursework in major universities as well as elementary and secondary schools. Relevant data from Securities Daily shows that the user scale of China’s online education industry surged from 140 million people in 2016 to 420 million people in 2020. Additionally, the market scale expanded rapidly from 221.8 billion in 2016 to about 500 billion. This move signifies the importance attached to the online education industry.

Attentional engagement has been one of the focal contents of research in previous studies on education (Becker et al., 2021; Zivony & Eimer, 2021). It has been defined as the conscious and sustained allocation of cognitive resources to guide problem-solving, planning, and meaning- and decision-making. In the traditional context of education, students’ attentional engagement is critical to academic achievement (Newmann et al., 1992) and is an essential factor for effectiveness of student learning. Assessing students’ attentional engagement in the learning process is vital for predicting student learning effectiveness. Based on the results of previous studies, many scholars have suggested ways to improve students’ learning efficiency by enhancing attentional engagement, such as using tools (e.g., fingers, pens, and other instruments) when reading text to avoid distractions by missing or skipping any text, or allowing their eyes to wander while reading (Juuti et al., 2021; Miao et al., 2022). However, there are several differences between the traditional classroom and online education owing to the different learning environments and platforms, which makes it difficult to directly apply the research findings on traditional education to online education contexts. In traditional classroom education, when students do not follow the material, they are often distracted and perform poorly on subsequent tests. Experienced teachers keep an eye on the attention span of their students and adjust their interactions with them accordingly (Wolff et al., 2015). However, in an online education context, teachers do not have access to this immediate feedback, especially when the content is presented in video format. Therefore, student distraction becomes more acute in an online classroom than in a traditional one where teachers can observe the external behavior of students’ attention levels (Means et al., 2009). To counter the lack of mechanisms for real-time monitoring of student attention in online educational contexts, a valid, real-time means of attention assessment is needed. The traditional method of assessing students’ attentional engagement uses questionnaires (Robinson & Hullinger, 2008). These instruments are commonly self-administered scales and are typically subjective. Second, such assessments are often administered at the conclusion of the learning process. Therefore, they have a lagging nature, making it difficult to assess objectively and effectively the level of students’ attentional engagement while learning. Whether they can be used as valid indicators of psychological resource engagement remains unclear (Trowler & Trowler, 2010), and is thereby detrimental to the development of online teaching activities. Online education activities need to have another objective indicator that teachers can use to regulate student attention. A few scholars have used the number of views and clicks (Koller et al., 2013), participation in learning forums (Brinton et al., 2014; Kizilcec et al., 2013), or length of time spent browsing (Guo et al., 2014) to measure people’s attentional engagement online. Yet, these results do not necessarily correlate with learning effectiveness (Koller et al., 2013). Attentional engagement of students in the context of online education requires in-depth investigation.

Eye tracking technology has been widely used to evaluate user interfaces, advertising, and other materials (Bissoli et al., 2019; Bucher & Schumacher, 2006; Yu et al., 2022). Research investigates the effects of typeface figuration (i.e., regular vs. irregular stylization of the form) on attention and attitude of the advertising viewers, and an eye-tracking-based assistive system for controlling and monitoring smart homes is proposed (Bissoli et al., 2019). A recent study using eye-tracking methodology to investigate how individuals process electronic word-of-mouth (eWOM) about jobs and employers on social media (Yu et al., 2022). In educational research, eye-tracking technology has been used to improve instructional design, assess learners’ professionalism, or target guided eye movements in teaching (Jarodzka et al., 2017). However, these studies usually examine the direction of the participant’s gaze toward a static image, for example, whether the user is focusing on a specific graphic or reading relevant text content (Yang et al., 2013). Recent research has found that in text reading, the analysis of eye movements shows that when an interruption occurs, there is more rereading behavior, and that the participants who are better at understanding the text are the ones who reread the most (Chevet et al., 2022). This method requires detailed analysis and interpretation of specific content. It cannot be routinely used to assess individual students; in online education, which is increasingly rich in dynamic stimuli, the content assessed by such a measure is increasingly complex (Madsen et al., 2020). It can be argued that the presentation of static stimuli is not well suited to the context of online education; therefore, the studies’ results on eye-tracking technology in static stimuli are difficult to apply to it.

In the context of dynamic content, an obvious indicator of attentional engagement is the gaze point (Deubel & Schneider, 1996). However, this method of assessing visual attention simply determines whether the participant is viewing the stimulus (Madsen et al., 2020). It cannot establish whether the participant is attending to the stimulus content or if the participant’s eye movements are due to a bottom-up attentional behavior driven by external source stimuli. In addition, previous review has shown that we can decouple the locus of attention from the position of the eyes in simple discrimination tasks, but the link is quite tight in complex information processing tasks such as reading (Rayner, 1998). Previous research has indicated that participants’ eye movements are highly correlated during video presentations (Franchak et al., 2016; Hasson et al., 2008), and that their attention is higher in attentional engagement states when viewing dynamic, high quality movies and television commercials. And a potentially useful physiological marker for detecting attention might be pupil size, the pupillary light response (pupils shrink in brightness and expand in darkness) depends not only on a stimulus’s brightness but also on whether you are aware of the stimulus, whether you are paying attention to it, and even whether you are thinking about it (Mathôt & Van der Stigchel, 2015). Van den Brink et al. (2016) investigated the relationship between tonic fluctuations in pupil diameter and performance on a demanding sustained attention task, they found robust linear relationships between baseline pupil diameter and several measures of task performance, suggesting that attentional lapses tended to occur when pupil diameter was small. Thus, the inter-subject correlation (ISC) of pupil size was selected as an indicator to assess attentional engagement. The ISC of eye movements is elevated and influenced by what is being viewed (Smith & Mital, 2013). The ISC of eye movements can detect whether the participant is actually paying attention to a visual stimulus rather than just looking at it. In contrast, previous approaches to evaluating active attention engagement have focused on measuring attentional blink during rapid visual continuous presentation (RSVP) tasks caused by attention to stimuli (Becker et al., 2021). However, it is not clear whether attention blink is due to the observer’s top-down setting for specific feature values or the relative feature of the target (Zivony & Eimer, 2021). Therefore, ISC is a more valid indicator of active attentional engagement than previously used attentional indicators (Madsen et al., 2020).

Eye tracking technology can also be used for attentional guidance. Eye movement modeling provides video teaching examples in which the eye movements of a domain expert or highly experienced person performing a cognitive task are recorded instantly with an eye-tracking device, along with the corresponding verbal explanations. These are then overlaid on visual material for playback to beginners or less experienced learners (Jarodzka et al., 2012). Its theoretical root is observational learning. The cognitive theory of multimedia learning is widely recognized in relevant research abroad. This theory was developed by American psychologist Mayer (2014), who posited three basic assumptions: (1) dual-channel processing. People process visual and auditory material through the visual and auditory channels separately; (2) limited-capacity processing. The amount of information individuals can process in each channel is limited. (3) Active-processing: people actively organize and summarize information. If learners wish to expand the relatively limited capacity of available knowledge resources and acquire information in a more meaningful and efficient way, they must systematically select, organize and process the relevant information resources along each pathway or different learning paths in a scientific, effective, and rational manner. Learners must efficiently select, organize and integrate information from the same or different channels to create coherent mental representations. Based on this theory, Mason et al. (2015) demonstrated that eye movement modeling examples helped students develop mental models of graphic relevance and integrate them with material already learned. Eye movement modeling examples briefly improve students’ learning abilities to more equitable and improved levels, particularly for students with low levels of reading ability. The impact is of a lesser degree for students whose abilities are stronger.

ISC is an indicator that can be used in education to help monitor and improve students’ attention levels during online educational processes to improve learning effectiveness. Madsen et al. (2020) studied the relationship between ISC and online learning performance. They showed that participants’ eye movements while watching instructional videos were similar, especially when they had high attentional engagement state attention, and that participants with high attentional engagement state attention had significantly higher ISC of eye movements during video presentation than participants with low attentional engagement. However, the validity of ISC as a predictor of student learning effectiveness remains unclear due to the scarcity of relevant research. Whether ISC can achieve the effect of Madsen et al.’s study in China based on cultural background and group differences remains to be determined. On the other hand, a study by Madsen et al. (2020) showed highly significant correlations between ISC measured in classroom and home webcam contexts and those attained on subsequent classroom tests. This result led the researchers to conclude that ISC effectively predicts student performance in exams. However, this study argues that while ISC has merit as an eye-movement indicator to assess attentional engagement, many factors influence learning performance. These factors cannot be predicted exclusively by students’ attentional engagement levels. For example, there are inherent individual differences in learning ability between students, which means that at the same level of attention, more capable students are better able to pick up on key points, while less capable students may not be able to recall the points in class as well on a test.

The current study introduces eye movement modeling examples to investigate the correlation between ISC and achievement in more depth using two experiments. The first experiment followed Madsen et al. (2020) study design to evaluate the validity and stability of ISC as an indicator for monitoring students’ attentional engagement in an online education context. It measures the variability in correlations between participants in high attentional engagement states of attention and low attentional engagement states of ISC. The “standard distraction technique” was used here to assess the state of low attentional engagement, in which subjects were asked to perform a continuous subtraction task (subtract seven consecutive numbers from a random prime of 800–1000, according to the number that appeared in the instructions) to test the attentional regulation of ISC during the second viewing of the video (Loftus, 1972). The usefulness of these tasks is based upon the assumption that they measure attention and mental concentration (Bristow et al., 2016). On this basis, the second experiment added a sample group with eye movement modeling examples to exclude the effect of participants’ different learning abilities and to observe the correlation between academic performance and attentional engagement levels.

Experiment 1: Verifying the validity of ISC

Methods

Participants

The experiment recruited 27 students (11 males and 16 females aged 19–21 years) from the Zhejiang University of Technology. Three participants were excluded due to missing data and problems with their eyeglass lens thickness preventing calibration of the oculomotor instrument. The final sample comprised 24 participants, all of whom had a normal or corrected-to-normal vision. Participants provided informed consent prior to the start of the experiment and received compensation after it finished.

Materials and instruments

The experiment was conducted using a desktop eye-tracking system, Eyelink 1000, and recorded eye movements from the participant’s right eye at a sampling frequency of 500 Hz. The experiment was programmed using Experiment Builder and ran on a computer with Windows 10 Professional operating system. The experimental stimuli were presented on a 24" LCD monitor (model 1JGVWR2) with a screen refresh rate of 144 Hz and a resolution set at 1920 × 1080 pixels. A chin rest was used to hold the participant’s head at 82.1 cm from the monitor throughout the experiment.

Before the experiment, referring to the findings in a previous related study (Madsena et al., 2020), ISC differed significantly between videos. This discrepancy may be due to the contents of the videos eliciting varying levels of attention or to the different ways visual dynamics drive bottom-up attention. Therefore, the videos selected were short, dynamic, and showed the teacher’s face engaged in enthusiastic conversation. This study selected popular teaching videos on four topics: immunology, astrophysics, artificial intelligence, and innovative thinking. Test questionnaires were developed on a Chinese university’s massive open online course (MOOC) teaching platform. The selected teaching videos are all taught by a teacher in the center of the video, and the proportion of teachers in the screen is higher than the width. During the teaching process, the teacher’s face and body will follow the instructions to produce certain actions, so as to enhance the interaction between teachers and students. The questionnaire consisted of eight to nine four-or single-choice questions related to the content of the videos, including two to three comprehension questions. The rest were re-identification questions. A total of six participants were recruited for the pre-experiment and completed the test after watching each of the four different types of videos. The percentages of questions correct for the four topics were: immunology 89%, astrophysics 67%, artificial intelligence 89%, and innovative thinking 78%. Based on the participants’ test results, the 7 minute, 50 s video on innovative thinking and its questionnaire were selected as the official experimental material, consisting of three comprehension questions and six re-identification questions with a score of 9 out of 10.

Procedure and design

Based on previous relevant studies (Madsen et al., 2020), the experiment was a single-factor within-participant design (whether the participant’s attention was in a high attentional engagement state: high vs. low attentional engagement state), with the dependent variable being the participant’s ISC value. The experiment was divided into a calibration phase and a formal experimental phase.

In the calibration phase, a nine-point calibration method was used, in which the pupil is allowed to follow the gaze point, and the pupil is tested in turn to determine if an adequate connection was made with oculomotor function. Results were scored as poor, fair, or good. When the participant attained at least the fair level, the experiment proceeded.

During the formal phase of the experiment, participants were told that they would complete the corresponding test after watching the instructional video in a state of high attentional engagement; after completing the corresponding test, participants were asked to watch the video again. They were asked to perform a serial subtraction task (subtracting seven consecutive numbers from a random prime number from 800 to 1000, based on the number appearing in the instructions) to test the ISC’s attentional regulation during the second viewing. They watched the video in a low attentional engagement state, which is a standard distraction technique (Loftus, 1972).

Statistical analysis

The ISC is calculated as follows, using vertical eye movements as an example: Step 1: Calculate the Pearson correlation coefficient between individual participants in the vertical direction; Step 2: Average the correlation values between those participants and all others; Step 3: Then repeat Steps 1 and 2 for all participants to calculate the individual ISC values for each participant. The above three steps were repeated for the participants’ horizontal eye movement and pupil size data to obtain ISC values for the three dimensions of ISC vertical, ISC horizontal, and ISC pupil. To obtain metric values for the laboratory experiment for subsequent experiments, we averaged the three ISC values and used them as the overall ISC values for individual participants: ISC = (ISC vertical + ISC horizontal + ISC pupil)/3. In this experiment, the ISC values of participants in the high attentional engagement state were calculated separately from those in the low state and imported into SPSS for paired samples t tests.

Results

Experiment 1 examined the ISC values of participants in two attentional engagement states. The study selected data material from the second to the penultimate frame of the start of the video to align the X- and Y-coordinates of the individual participants’ gaze points and pupil dimensions in a time series. If participants subsequently introduced errors by not stopping the eye movement device at the end of the experiment, their data was removed, as shown in Table 1.

Table 1 Changes in the X and Y-coordinates of the gaze point, changes in pupil size and total ISC for each subject in the high and low engaged states

The ISC values of the X-coordinate of the gaze point were significantly higher in the high attentional engagement state than the low attentional engagement state according to the paired samples t test (t = 15.589, p < 0.001, d = 0.89). The ISC values for the Y-coordinate were significantly higher than in the low attentional engagement state (t = 6.508, p < 0.001, d = 0.72), the ISC values for the pupil size coordinate were significantly higher than in the low attentional engagement state (t = 7.556, p < 0.001, d = 0.72). The total ISC values for participants in the high attentional engagement state were significantly higher than for those in the low state (t = 16.083, p < 0.001, d = 0.91).

The correlation between ISC and test scores at high attentional engagement was analyzed. The Pearson correlation coefficient showed that ISC at the X-coordinate of the gaze point was weakly correlated with test scores (r = 0.252). ISC at the Y-coordinate of the gaze point was weakly correlated with test scores (r = 0.275), and ISC at pupil size was not correlated with test scores (r = − 0.107). Total ISC values were weakly correlated with test scores (r = 0.172).

Experiment 2: Using attentional guidance to enhance the effectiveness of teaching and learning

Methods

Participants

This experiment recruited 29 students (11 males and 18 females, aged 19–22) from Zhejiang University of Technology. As in Experiment 1, some participants were excluded due to their eyeglass lens thickness preventing calibration of the oculomotor instrument. Only participants with myopia below 500 degrees were recruited for this experiment. Two participants were excluded due to missing data. One participant failed to read the instructions correctly and ignored the follow-up test. All participants had normal or corrected visual acuity. All participants signed an informed consent form before the start of the experiment. All were adequately compensated after the experiment.

Materials and instruments

The video material from Experiment 1 was used as the basis for creating a sample eye movement example. This material was recorded by a student who had studied it beforehand, and who was informed before the recording that his eye movement data would be used for teaching purposes. The movement trajectory was exported from the oculograph for the sample eye movement example after the student had finished studying the video. The production process was as follows: only gaze points of 500 ms or more were included, setting the color of the gaze point to red and the size to 100 pixels = 500 ms (100 pixels equals 500 ms of gaze time), with a 0.5  second fade out, that is, the longer the gaze time, the larger the gaze point. At the same time, to avoid the effects of additional interference, some of the gaze points that were clearly off the focus of the lecture were removed; only those within the center of the video frame were retained.

Procedure and design

The experiment was a single factor between-participants design (whether or not the participant received the eye movement modeling examples: with the eye movement modeling examples vs. without the eye movement modeling examples), with the dependent variable being the correlation coefficient between the participant’s ISC and test scores. Additionally, the experiment was divided into a calibration phase and a formal experimental phase, and the calibration part was identical to Experiment 1.

During the formal experiment, participants in Experiment 1’s high-attention state were designated as the no-eye movement modeling examples group. In the eye movement modeling examples group, participants were told that they would be watching an instructional video and that red dots would appear from time to time in the video. They were further informed that these dots represented material that a previous student who had studied this video in-depth had reported as requiring focus and that they should watch the video carefully and complete a follow-up test.

Results

Experiment 2 examined the ISC values of participants with and without the eye movement modeling examples. To align the horizontal and vertical coordinates and pupil size of the individual participants’ gaze points in the time series, the study selected data material from the time between the second frame of the start of the video and the penultimate frame. Participant data was removed if participants introduced errors in case the eye movement device was not stopped at the end of the experiment, as shown in Table 2.

Table 2 Changes in the X and Y-coordinates of the gaze point, changes in pupil size and total ISC for each subject with the eye movement modeling examples

Table 2 shows that the ISC values of the participants in the eye movement modeling examples group and the participants in the non-eye movement modeling examples group corresponded to the following. The ISC values of the participants in the two groups were significantly different according to an independent sample t test (t = 2.847, p = 0.006, d = 0.34). Furthermore, they were significantly higher in the non-eye movement than in the eye movement modeling examples group. In addition, an independent sample t test was conducted on the groups’ test scores. It found that the group with the eye movement modeling examples was significantly higher than the group without them (t = − 2.106, p = 0.040).

The correlation between ISC and test scores in the eye movement modeling examples was found to be weak (r = 0.063) for ISC at the X-coordinate of the gaze point and weak (r = 0.132) for ISC at the Y-coordinate of the gaze point. There was no correlation (r = 0.096) between ISC for pupil size. In addition, there was a weak correlation between total ISC values and test scores (r = 0.132).

Discussion

The present research examines the effectiveness of synchronized eye movement assessment of attention engagement through two experiments. The first experiment discovered the correlation (ISC) of participants’ eye movements in different conditions (high attentional engagement VS. low attentional engagement). In the second experiment, subjects watched video materials under the condition of having eye movement modeling examples, to improve teaching effectiveness. The results showed that the ISC values of the changes in the X- and Y-coordinates of the participants’ gaze points were significantly higher in the high attentional engagement state than in the low attentional engagement state. However, the ISC values in the X-coordinate changes of the participants’ gaze points were higher than those of the Y-coordinate changes in the high and low attentional engagement states. Hence, the consistency of gaze points of all participants was higher for the X than the Y-coordinates, which may have been due to the presence of teachers with life characteristics in the video materials selected for this experiment. According to Social Agency Theory, a teacher’s presence in a video promotes a sense of interaction between him/her and the learner. This sense of teacher-learner interaction improves cognitive processing and learning performance (Mayer, 2014). In online teaching, the teacher’s presence provides cognitive and social cues throughout the course. These cues influence student attention, engagement, and learning effectiveness.

In this experiment, the teacher occupied an area with a height much larger than the screen’s width. When the video was presented, the teacher’s presence affected the participants’ attention. The participants devoted their close attention to the teacher. Some participants focused on their face, while others focused on their hands. This discrepancy resulted in significant variation in the Y-coordinates of the participants’ gaze. Furthermore, some participants would occasionally pay close attention to the video progress bar of the video and its end time. This behavior also reduced the ISC value of the change in the Y-coordinate of the participants’ gaze points.

Furthermore, ISC values for pupil size changes in high attentional engagement states were also significantly higher than those in low attentional engagement states, suggesting that pupil size can be used as a constituent of the ISC index to predict the level of attentional engagement. Previous studies have shown that three components of the attentional network: the alerting, orienting and executive control network elicit pupil dilation of varying duration and intensity. It has been suggested that pupil change may serve as an indicator of the function of the attentional network (Petersen & Posner, 2012). Furthermore, baseline pupil diameter and task-evoked pupil responses are closely related to individual cognitive performance (Lu et al., 2021). This study focused on task-evoked pupil responses, while controlling for confounding factors such as environmental and stimulus luminance and participant age. In the future, we could incorporate baseline states into relevant ISC studies. The present study provides a degree of empirical evidence for the previous view by verifying the validity of pupil size as an indicator of attentional engagement. Although this study verified the validity of the ISC as an indicator of attentional engagement, the ISC only included three oculomotor indicators, namely the changes in X- and Y-coordinates in gaze point and pupil size. In future studies, more oculomotor indicators, such as first gaze and gaze back times could be tested and included to improve ISC stability and validity. In addition, the video material selected for the two experiments in this study to collect eye movement data was approximately eight minutes, and the effects of ISC in more brief learning sessions have not been considered in the experiments. It may be possible to consider the inclusion of more brief learning situations in subsequent studies to determine the robustness of the ISC effect and to enhance its applicability in online education.

However, whether low attentional engagement treatment simulates a natural state of distraction remains unclear. To test the ISC’s validity in assessing attentional engagement, it may not be appropriate to force distractions in a way that the participants are unable to attend to the video, which will perhaps not be consistent with the distracted state of participants in a natural setting when watching such instructional videos. This may imply that the ISC is less reliable as an indicator of attentional engagement. However, the causes and behaviors of distraction in natural environments are so diverse that it is difficult to reproduce such scenarios in a laboratory setting fully and exactly. Given that previous researchers have used this approach and a more suitable one has not yet been established, the classic distraction task of serial subtraction has been adopted, with the expectation that future research will develop a more credible distraction task for such studies.

The statistical results showed that the eye movement modeling examples improved participants’ overall performance. At the same time, it decreased their ISC. The result suggested that the eye movement modeling examples had a positive effect on the participants’ attentional guidance for improving teaching effectiveness. Nevertheless, when the participants were informed in advance of the need to pay special attention to the guidance material, (the specific requirement to pay attention to the red dots), it may have caused tension and thus affected their attention allocation. It may have been similar to a football match, where the referee’s mental activity is highly variable, and attention must be maintained under tense conditions. This state can lead to an over-concentration of attention and an inability to allocate and shift attention in time to achieve good observation and judgment. As in the case of football, the referee’s mental activity is highly variable and attention is maintained under tension, leading to excessively focused concentration, failure to allocate and divert attention in a timely manner, and failure to achieve good observation and judgment, resulting in misjudgments, such as blowing the whistle and handling game disputes prematurely. Therefore, the decrease in ISC in the eye movement modeling examples group compared to the non-eye movement group cannot be seen as a reduction in attentional engagement in this experiment. Instead, it could be an attentional allocation problem due to high concentration. Even in the eye movement modeling examples group, the ISC values were significantly higher than those of the low attentional engagement participants performing the distraction task. When the same participants were briefly raised to a higher level of learning ability using the eye movement modeling examples, a high correlation between ISC values and performance was still not found.