Theory-based approach for assessing cognitive load during time-critical resource-managing human–computer interactions: an eye-tracking study

Sevcenko, Natalia; Appel, Tobias; Ninaus, Manuel; Moeller, Korbinian; Gerjets, Peter

doi:10.1007/s12193-022-00398-y

Theory-based approach for assessing cognitive load during time-critical resource-managing human–computer interactions: an eye-tracking study

Original Paper
Open access
Published: 28 November 2022

Volume 17, pages 1–19, (2023)
Cite this article

Download PDF

You have full access to this open access article

Journal on Multimodal User Interfaces Aims and scope Submit manuscript

Theory-based approach for assessing cognitive load during time-critical resource-managing human–computer interactions: an eye-tracking study

Download PDF

3761 Accesses
Explore all metrics

Abstract

Computerized systems are taking on increasingly complex tasks. Consequently, monitoring automated computerized systems is becoming increasingly demanding for human operators, which is particularly relevant in time-critical situations. A possible solution might be adapting human–computer interfaces (HCI) to the operators’ cognitive load. Here, we present a novel approach for theory-based measurement of cognitive load based on tracking eye movements of 42 participants while playing a serious game simulating time-critical situations that required resource management at different levels of difficulty. Gaze data was collected within narrow time periods, calculated based on log data interpreted in the light of the time-based resource-sharing model. Our results indicated that eye fixation frequency, saccadic rate, and pupil diameter significantly predicted task difficulty, while performance was best predicted by eye fixation frequency. Subjectively perceived cognitive load was significantly associated with the rate of microsaccades. Moreover our results indicated that more successful players tended to use breaks in gameplay to actively monitor the scene, while players who use these times to rest are more likely to fail the level. The presented approach seems promising for measuring cognitive load in realistic situations, considering adaptation of HCI.

Development of a Quantitative Evaluation Tool of Cognitive Workload in Field Studies Through Eye Tracking

Insights from Eye Movement into Dynamic Decision-Making Research and Usability Testing

Cognitive Workload of Humans Using Artificial Intelligence Systems: Towards Objective Measurement Applying Eye-Tracking Technology

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

1 Introduction

Our daily lives are becoming more and more automated, with computerized systems such as autopilots, speed and lane assistants and robotic surgeons taking on increasingly complex tasks. Thus, a human operator is supported considerably by such systems, however, monitoring of such automated systems is becoming increasingly demanding due to the rising complexity of tasks they are capable of executing. This problem is particularly evident in unexpected time-critical situations, where the operator does not have time for a detailed analysis of the interface and therefore relies more than usual on cognitive ergonomics.

A possible solution to this challenge may be the development of intelligent human–computer interfaces (HCIs) capable of adapting their demands and appearance (e.g., minimizing displayed information when the operator is cognitively overloaded) to the situation as well as to the operators’ cognitive and emotional states to optimize performance and prevent failures. The so-called cognitive load on the operator seems to be particularly relevant in this case, because it is considered to reflect the degree to which available cognitive resources are engaged in the task at hand [1] and thus can be used to predict operators’ performance. Accordingly, the detection of the actual degree of cognitive load appears to be of great importance in a variety of realistic settings [2]. For example, based on this information, the appearance of the user interface could be adapted appropriately.

Empirical evidence indicates that differently designed HCIs may induce different levels of cognitive load while performing the same task. As one example, Charabati et al. [3] compared interfaces designed to monitor anesthesia parameters during a surgery and reported that participants rated their cognitive load significantly lower when using a mixed numerical–graphical interface compared to the numerical and advanced-graphical interfaces. Another example was provided by the study of Oviatt [4] who found that students performed significantly better at solving mathematical problems when using a digital pen and paper interface compared to graphical tablet interfaces. At the same time, online adaptation of the environment to the operators’ current cognitive load was shown to lead to a significant performance improvements [5].

This raises the question of which measurement methods and metrics are best suited for the assessment of cognitive load in such systems. Fortunately, digital systems easily allow for collecting individual user data that may well be used to model cognitive or emotional states of the operator [6] including but not limited to cognitive load. Data driven approaches such as machine learning [2] appear very promising in this regard, however, they mostly require specific calibration and cannot be easily generalized to different users and settings. We aim to address the question whether this limitation might be solved by using a top-down approach, that is, detecting cognitive load states based on a suitable theoretical framework. By doing so, we hope to avoid additional calibrations and achieve good genaralizability of the method, when applying the proposed measurement system to similar settings. Specifically, in this study we investigated whether eye-tracking features collected during initial burst and initial idle time intervals (see Sect. 1.3.1), and calculated in a top-down procedure based on a theoretical model, may be successfully used to measure cognitive load during time-critical emergency simulation.

This article is structured as follows. The next section (Sect. 1.1) provides a brief introduction into concept of cognitive load and its measurement, while specifically addressing eye-tracking method and eye-tracking features which were used in this study (Sect. 1.2). The subsequent section (Sect. 1.3) describes the time-based resource-sharing model [7], which we chose as a theoretical base for our later calculations. Section 1.3.1presents the temporal action density decay metric [8], derived from the model and provides introduction into calculation of burst and idle time periods, which are later used for recording of eye-tracking features. Subsequently the Materials and Methods (Sect. 2) and Results (Sect. 3) are described and discussed (Sect. 5).

1.1 Cognitive load

The concept of cognitive load is based on the realization that cognitive resources are limited [9]. It can be understood as the degree of “how hard the brain is working to meet task demands” [10]. At the same time, it needs to be considered that cognitive load evolves as a complex interplay between different task demands and mental processes [1], and thus represents a dynamic variable that fluctuates during task accomplishment.

An association between cognitive load and human performance was demonstrated in a variety of realistic settings such as e-learning [4, 5], transportation [11,12,13,14,15], aviation [16], office work [17, 18] and medicine [19]. It has been observed, that this relation seems to be shaped like an “inverted-U” [20] with best performance under medium cognitive load. Additionally, it seems to be associated with Csikszentmihalyis’ concept of “flow” [21, 22], which characterizes a state of total concentration on the task at hand and also assumes that performance usually declines when cognitive demands are too low or overstraining [e.g., 20, 23,24,25]. As such, this indicates that human–computer interaction might be optimized by keeping the operators’ cognitive load at a medium level [26].

The vision of an adaptive HCI to provide an optimal driver support was already expressed by Michon [27] in 1993. Since then, several attempts have been made to develop adaptive systems. As one example, BMW [28] has developed a system that diverts incoming calls to voicemail when driver cognitive overload is detected. Therefore, the cognitive state of the driver was estimated based on the traffic situation and driving dynamics. Toyota also took a similar approach by resetting voice messages when the driver was overloaded; thereby the authors estimated the cognitive state based on the use of the accelerator pedal. Comparatively, researchers from Daimler [29] used motion dynamics in the car seat to estimate cognitive load. Another way to reduce cognitive load in a car consists in optimizing intervals between incoming messages [30]. Messages that appear too frequently or concurrently can increase the driver's cognitive load and impair driving performance, whereas an adaptive extension of the intervals between messages reduces cognitive load and improves driving performance. [31]. As another example, Kohlmorgen, et al. [32] substantiated this consideration by showing that an adaptive reduction of cognitive load improved driving performance under real traffic conditions.

Also in aviation the research of cognitive load has a long tradition, with first studies aiming to detect cognitive states in pilots date back to the 1980 and 1990s [33, 34]. In subsequent years, the idea of developing intelligent assistive systems adapted to cognitive load came to the foreground. As one example, adaptive cognitive agent [35, 36] was developed to individually support helicopter crew members. The cognitive load estimator detectes states of cognitive overload based on task load and behavioral data and adaptively decided whether the operator needs support. Another adaptive cognitive agent was developed to help with air traffic management, and was shown [37] to significant improve operators’ performance. As another example, Wilson and Russell [38] used neural network to detect states of high cognitive load during a simulated uninhabited aerial vehicle task based on (neuro-) psychophysiological data and found that adaptive aiding may significantly enhance pilots’ performance.

Besides automotive and aviation fields, such highly relevant contexts as medicine, emergency management and education were also shown to benefit from the adaptation to cognitive load. To give a few of examples, Sarkar et al. [39] designed a multitasking deep neural network to classify high and low cognitive load states in experts and novices while performing a trauma simulation based on electrocardiogram (ECG). Mirbabaie and Fromm [40] developed an augmented reality support system to support the emergency management during realistic emergencies. Yuksel et al. [41] reported on the benefits of adaptively increasing task complexity on learning performance in pianists, While, Walter, et al. [5] observed significant improvement in learners’ math performance when using an electroencephalography (EEG)-based adaptive learning environment.

Taken together, this brief summary indicates that online adaptation to cognitive load is beneficial as well as practicable. Already achieved results appear promising for a variety of realistic contexts while this research field remains vibrant. Thus, considering that cognitive load is a dynamic variable that fluctuates over time during task completion, we need a measurement method able to capture cognitive load at the early stages of task processing to adapt HCI to it in a timely manner, because the earlier a non-responsive cognitive state can be detected, the earlier HCI can be adapted accordingly.

Measurement techniques of cognitive load can be classified into four main categories: (i) Performance-based, (ii) Subjective, (iii) Behavioral, and (iv) Physiological measurements [42,43,44]. Performance-based measurements rely upon user performance, for example, the rate of correct responses while solving a sequence of arithmetic tasks. These measurements are hardly applicable in the context of human–computer interaction (HCI), where intermediate results can seldom be identified. Subjective measurements are usually obtained by (standardized) questionnaires such as SWAT [45] and NASA-TLX [46]. These measurements are well validated, easy to apply, and highly reliable. Unfortunately, they can primarily be collected after the task has already been completed and thus are hardly applicable for online assessment of cognitive load when aiming for timely HCI adaptation. Behavioral measurements rely on the analyses of differences in operators’ interaction behavior with the system, such as mouse usage, click rates, etc. They potentially allow for online evaluation of fluctuations in cognitive load. However, behavioral measurements might be influenced by factors other than the task at hand such as attentional or motivational processes [e.g., 47]. Finally, physiological measurements (e.g., measurements of heart rate variability and electrodermal activity, electroencephalography (EEG), functional magnetic resonance imaging (fMRI), eye-tracking) relate physiological parameters to psychological constructs including cognitive load [44, 48,49,50]. They seem very promising for the development of adaptive systems because they allow for continuous recording of the respective variables and thus online adaptation. However, they often require costly equipment and sophisticated methods of data analysis. Moreover, some physiological methods are hardly feasible in realistic HCI environments [for an overview see 51] due to their immobility (e.g., fMRI) or high noise sensitivity (e.g., EEG).

1.2 Eye-tracking

One physiological method which is gaining increasing popularity when measuring cognitive load is eye-tracking. In particular the subtle realization of modern video-based eye-tracking systems [for a general overview of eye tracking methodology, see: 52] makes this technique potentially promising for the commercial development of user-friendly adaptive HCIs. In this article, we focused on fixations, blinks, saccades, microsaccades, and pupil diameter as specific indices of participants’ oculomotoric behavior because, as described in more detail below, evidence has shown that these features respond to fluctuations in cognitive load.

1.2.1 Fixations

Voluntarily controlled stable gazes that usualle last for about 200–300 ms are called fixations [53]. During these periods eyes stay relatively still, while the person processes information from the fixation area [54]. The relation between cognitive load and fixation time seems to depend on the task at hand. There is evidence that increased task complexity is associated with fewer but longer fixations [for reviews see: 53, 55]. For example, Chen, et al. [56] concluded that increased fixation duration and decreased fixation rate indicate increased attentional effort on a more demanding task. Similarly, De Rivecourt, et al. [57] found that increased task complexity is associated with longer fixations on the control instruments during simulated flight. Contrarily, however, Van Orden et al. [58] observed fixation frequency to systematically increase with the visual complexity of a target classification task.

1.2.2 Saccades

Eye movements between two fixations that allow for exploration of the surroundings and attention control are called saccades. Similarly to fixation frequency, saccadic rate also appears to be strongly dependent on the type of the task. While increasing difficulty of the visual search task was shown to lead to increased saccadic rate [59], there is also evidence that saccadic rate decrease with the difficulty of the non-visual task [60].

1.2.3 Microsaccades

If our eyes would stay completely still during fixation, the visual image would gradually fade because neural response weakens with constant stimulation [54]. Microsaccades are small unintentional eye movements, which cover less than 1° of visual angle and prevent currently viewed visual information from fading [54]. Evidence suggests that microsaccadic frequency increases with increasing visual complexity of the task at hand [61], whereas in non-visual tasks microsaccadic rate seems to decrease and microsaccadic magnitude to increase with task difficulty [62, 63].

1.2.4 Blinks

A commonly known function of blinking consists in moisturizing the eyeball and protecting it from physical damage. Besides that, in addition to microsaccades, blinking is also needed to prevent perceptual fading [64]. Moreover, bursts of blinks seem to occur before and after periods of intense information processing [65]. Additionally, high blink rates were found associated with higher cognitive load [60].

1.2.5 Pupil dilation

The main function of the eye pupil is controlling the amount of light that enters the eye to achieve the best possible visual perception. This metric is most commonly considered in cognitive load research [for a general overview see: 66]. In states of high cognitive load, pupil diameter was repeatedly observed to increase proportionally both in visual and non-visual tasks [60, 67, 68]. Hovewer, besides of this association, pupil size responds to a relatively broad number of stimulate reaching from intake of drugs [69, 70] to such internal states as, e.g., interest and arousal [71, 72]. Therefore, it is not trivial to determine what the exact reason for the observed pupil dilation was, and interpretation of the results must be done with caution.

2 Summary

Taken together, eye-tracking seems to be a promising technique well-suitable for assessing cognitive load during HCI. Empirical evidence indicates that the eye-tracking measures listed above fluctuate dynamically over time [65]. Hence it might be advantageous to know the exact on- and off-set of each stimulus to be able to effectively differentiate between states of low and high cognitive load. While this premise is easy to achieve in a controlled laboratory setting, realistic HCI usually consists of a variety of interlocking tasks and stimuli that are impossible to analyze separately. Therefore, we chose a specific analytic approach. Instead of analyzing eye-tracking data for the entire time course of HCI, we focused on the analysis of most relevant time periods determined according to an established theoretical approach, allowing for better generalizability of these calculations to similar situations. In the context of time-critical interactions under severe time restraints, the time-based resource-sharing (TBRS) model [7] briefly described below provided a suitable theoretical basis for assessing cognitive load.

2.1 Time-based resource-sharing model

The main idea proposed in the TBRS model by Barrouillet, et al. [7] is that, in addition to task complexity, cognitive load also strongly depends on available time, which is particularly relevant in time-critical situations. The model describes working memory as a core system of cognition consisting of two processes indispensable for the execution of a cognitive task: information storage and processing. According to the model, both components require attention, to switch between subtasks resulting in complex and time-critical interactions between them and eventually causing interruptions in the processing of subtasks. Based on these assumptions, TBRS predicts that cognitive load, and thus performance “depends on the proportion of time during which attention is captured in such a way that the storage of information is disturbed” [7]. However, the authors acknowledge that it is not trivial to determine these time intervals.

2.1.1 Temporal action density decay metric

In a recent study [8], authors addressed this challenge by proposing the temporal action density decay (TADD) metric, which is based on the TBRS model and was developed to estimate cognitive load in time-critical situations that require resource management. According to this approach, such situations can be divided into a series of so-called action blocks consisting of active phases (burst), in which resources are managed, and waiting phases (idle), in which all resources are occupied or unavailable and one must wait until a new task appears or a resource becomes available again (for a detailed example see Fig. 1). The ratio of the length of the first detected burst (initial burst) to the length of the first action block (initial action block: occurred right at the beginning of each level), was shown to significantly predict performance [8]. In fact, it turned out that participants, who completed their tasks faster and therefore had to wait longer at the beginning of the level, were also significantly more likely to successfully complete the respective level.

2.2 Present study

Although the TADD metric has been shown to be significantly related to cognitive load, it is a behavioral measure that does not allow direct conclusions to be drawn about the cognitive activities occurring at that time. While participants’ cognitive engagement during the initial burst can be estimated based on logged in-game actions, this method cannot be used for the initial idle period, because no actions are performed during this time. It is conceivable that some participants might use the initial idle for relaxation, which would be reflected in decreased cognitive load, whereas others might use this time for planning and visual screening of the scenery, which we expect to lead to an increased level of cognitive load as compared to the first group.

In this study, we aim to further investigate the initial burst and idle time periods while playing the time-critical serious game using the eye-tracking method. Therefore we obtain fixations, blinks, saccades, microsaccades, and pupil diameter during these time intervals. Based on evidence we expect fixation frequency [58, 73] and saccadic and microsaccadic rates [61, 73] to increase in response to visual load and to decrease [56, 57, 60, 62, 63] in response to non-visual cognitive load. Based on previous evidence we also expect pupil dilation [67, 68, 73] and blinking rate [60, 65] to increase with increased difficulty.

We expect that cognitive behavior during the initial idle time period should have impact on gaming success. Specifically, we expect that higher visual and cognitive activity during this time will help maintain and promptly update the cognitive model of the game scene, leading to better reaction times and thus better game success for participants who are more active during this time.

On the other hand, based on the theoretical considerations from Sevcenko et al. [8], we expect all participants to work at their limit, that is, to have maximum cognitive load during the initial burst phase. For this reason, we did not expect any relationship between eye tracking data during the initial burst phase with task difficulty as well as their subjective assessment of cognitive load and their performance.

3 Materials and methods

The study was carried out as part of a larger project. Besides eye-tracking features described below, it included other measurements which are not covered in this paper, i.e. behavioral in-game data, cardiac activity, galvanic skin response, and cortical hemodynamics, measured by functional near-infrared spectroscopy [8, 74].

3.1 Participants

47 participants took part in this study. In the following, we present data of 42 participants (31 females, 11 males) aged between 19 and 48 years (M = 24.3; SD = 5.4). Five participants were excluded from the analysis due to poor quality of their eye-tracking data. All participants spoke fluent German and were right-handed. They were recruited via an online database and compensated for their time expenditure. None of the participants reported neurologic, psychiatric, or cardiovascular disorders, and none of them were taking psychotropic medications. The study was approved by the local ethics committee and written informed consent was obtained prior to the experiment.

3.2 Task

Participants played an adapted version of the serious game [Emergency: 75], simulating time-critical emergencies. There were two different emergency scenarios with three different levels of difficulty each. During the game, participants had to coordinate different emergency personnel, such as emergency doctors, paramedics, and firefighters, as well as ambulances, fire- and ladder trucks to rescue victims and extinguish fires.

After familiarizing themselves with the task by playing a learning sequence, all participants completed two experimental scenarios: Fire and Train Crash. The learning sequence consisted of a short tutorial followed by a car accident scenario where participants had to free all victims from the crashed vehicles, provide first aid and then arrange their transport to hospital. The time limit for the training scenario was 5 min.

In the Fire scenario, participants had to extinguish a burning building block, rescue some residents from burning houses, provide first aid, and arrange their transport to hospital within a time limit of 7.5 min. The scenario Train Crash involved a train crashing into a building and causing a quick-spreading fire. The scenario required participants to free trapped passengers, provide first aid, and arrange their transport to hospital, as well as extinguish numerous fires. The time limit for each level of this scenario was 10 min.

Each scenario was presented at three levels of difficulty: easy, medium, and hard, as defined by varying the number of tasks to be performed and the number of personnel to be coordinated (see Table 1). In this way, the increasing density of actions increased task demands in terms of planning, coordination, and prioritization, leading to varying levels of cognitive load. Time pressure was additionally induced by setting time limits for levels.

Table 1 Overview over the initial game parameters [cf. 8]

Full size table

3.3 Experimental setup and design

The experiment was performed in a quiet room under constant light conditions (see Fig. 2). The Emergency serious game was presented on a 16’’ notebook driven at a screen resolution rate of 1920 × 1080. A conventional computer mouse was used as the only interaction device. Gaze data were recorded at 250 Hz using a SensoMotoric Instruments (SMI) RED250 eye tracker with 0.4° gaze position accuracy in combination with SMI Experiment Center 3.7.60 software installed on the same notebook. The eye tracker was calibrated using SMIs’ integrated 9-point calibration procedure. The seating position of each participant was determined individually before the calibration of the eye tracker, without using a chin rest.

The study was implemented in a within-subject design, that is each participant completed all scenarios and levels. Each participant performed the same predefined sequence of levels only once. To minimize order effects, we decided to present the levels and scenarios in a constant sorted order, starting with the easiest one. The experiment began with the calibration followed by a baseline phase during which participants were asked to sit still and look at a fixation cross for 5 min to acquire baseline parameters of physiological measures. After that, participants completed an introductory learning sequence, followed by the two scenarios with their respective levels of difficulty (Fire: easy, medium, hard; Train Crash: easy, medium, hard). Subjective ratings of cognitive load experienced during the Emergency serious game were obtained intermittently after each level using the NASA-TLX questionnaire. The whole experiment lasted about one hour including the training time.

3.4 Features

To estimate cognitive load, we used eye-tracking features known from the literature in this regard. These data were recorded during narrow initial burst and initial idle time periods, which were calculated based on log data as described below. Hereafter, means of the respective eye-tracking measures for the respective time periods were associated with the difficulty scores of levels as well as participants’ performance and their subjective ratings of cognitive load.

3.4.1 Difficulty score and performance

For each level, a difficulty score was defined as the percentage of participants, who failed to complete all tasks within the predefined time limit (see Table 1). Performance was reflected individually for each participant as the binary indicator of whether the level was completed successfully or not.

3.4.2 Subjective rating of cognitive load

After completing each level, we asked participants to rate their subjectively experienced cognitive load by completing selected items of the NASA-TLX questionnaire [46]. In order not to disturb the eye-tracking calibration procedure, the items were read aloud by the experimenter while the subject was instructed to sit still and look at the monitor while answering. The NASA-TLX consists of six items rated on a 21-level scale (0 to 100 points with steps of 5), and its’ dimensions correspond to various theories distinguishing between physical, mental, and emotional demands imposed on the operator [76]. In this study, we considered the three items addressing the mental facet of operators’ load (i.e. mental demand, temporal demand, and effort) [77, 78].

3.4.3 Eye-tracking features

When analyzing gaze data, we focused on fixation frequency, saccadic and microcaccadic rates, number of blinks, and pupil diameter recorded durung initial burst and initial idle time periods. These eye-tracking features were extracted using SMI Experiment Center 3.7.60 software. During preprocessing of pupillometric data, we removed all data points where pupil diameter was non-positive, because such artifacts typically indicate invalid data. We also did not consider data up to 100 ms immediately before and after each blink, because during these periods the pupil is partially occluded by the eyelid or eyelashes and thus cannot be detected reliably [79, 80]. Finally, we linearly interpolated small gaps of up to 50 ms to increase the amount of usable data. The rate of microsaccades was computed using the method proposed by Krejtz et al. [81]. After preprocessing, we averaged the collected data over time, subtracted the respective baseline value, recorded during 5 min prior to the start of the experiment (see Sect. 2.3 Apparatus and experimental setup), and z-standardized all features.

3.4.4 Analyzed time periods

In the present article, we determined initial burst and initial idle periods individually based on logged in-game activities, with both time periods varying between participants. At the beginning of each level, an initial burst period and an initial idle period were recorded, resulting in six pairs of periods per participant.

3.5 Statistical analysis

We employed linear mixed-effect analyses using statistical software R [82] with the lme4 package [83]. The p-values were obtained by likelihood ratio tests of the full model tested against a reduced model. Further model analyses were applied in case of a significant result, using the report package of Makowski, et al. [84]. Standardized parameters were obtained by fitting a model on a standardized version of the dataset.

4 Results

In this section, we present in detail associations between gaze data collected during initial burst, and initial idle time periods and (1) difficulty score, (2) participants’ performance, and (3) subjective estimation of cognitive load. To ensure better readability and not overwhelm our readers with the vast amount of statistics, we have decided to only report significant results in detail.

4.1 Difficulty score

First, we aimed at investigating whether task difficulty affected oculomotor behavior during the initial burst and initial idle time periods. Therefore, we fitted linear mixed models for both time periods to predict eye-tracking data based on difficulty score. We considered difficulty score^{Footnote 1} as a fixed effect and added random intercepts for subjects. Table 1 depicts difficulty scored calculated per level as the percentage of participants, who failed to complete all tasks within the predefined time limit.

4.1.1 Initial burst

During the initial burst period we found significantly negative association between difficulty score and fixation frequency as well as with saccadic rate, whereas the effect of difficulty on pupil diameter was positive. That is, during initial burst phase more challenging levels were significantly associated with less fixations and saccades as well as with increased pupil diameter. See Fig. 3 and Table 2.

Table 2 initial burst: eye-tracking features ~ difficulty score

Full size table

4.1.2 Initial idle

We found no significant association between gaze behavior and difficulty score for any eye-tracking within the initial idle phase, see Table 3

Table 3 Initial idle: eye-tracking features ~ difficulty score

Full size table

4.2 Performance

We fitted linear mixed-effect models for each combination of gaze features and time periods on the relationship between performance and gaze data. We added random intercepts for participants and considered performance as a fixed effect. We also considered the scenario^{Footnote 2} as a fixed effect because we were interested in whether performance induces an effect in addition to the scenario.

4.2.1 Initial burst

During the initial burst phase we found a significantly negative effect of scenario on fixation frequency and saccadic rate, whereas the effect of scenario on pupil diameter was positive. That is, while playing more challenging Train Crash scenario participants did significantly less fixations and saccades, while their pupil diameter was significantly increased compared to Fire scenario. However, the associations between gaze data and performance showed the opposite pattern: we found positive effect on fixations, saccades and microsaccades, along with negative effect on blinks, meaning that participants who successfully completed the level, showed significantly more fixations, saccades and microsaccades, but blinked significantly less often during the initial burst phase compared to unsuccessful participants (see Fig. 4 and Table 4).

Table 4 Initial burst: eye-tracking features ~ scenario and performance

Full size table

We found a significant positive effect of performance on fixation frequency, saccadic and microsaccadic rates, meaning that participants who failed the level also exhibited less fixations, saccades and microsaccades during the initial burst phase. In contrast, the effect on blinks was significantly negative, meaning that more successful participants blinked less during initial burst phase.

4.2.2 Initial idle

During initial idle only pupil diameter differed significantly between scenarios, whereas more challenging Train Crash scenario was associated with increased pupil diameter. In contrast, performance was significantly positively associated with fixation frequency, saccadic, and microsaccadic rates, meaning that participants who succeed the level showed significantly more fixations, saccades and microsaccades compared with participants who failed. See Table 5 for detailed statistics.

Table 5 Initial idle: eye-tracking features ~ scenario and performance

Full size table

4.3 Subjective assessment of cognitive load

To investigate whether gaze behavior was influenced by subjectively reported cognitive load we fitted linear mixed-effect models for each combination of the inspected NASA-TLX items and gaze features on the relationship between subjective cognitive load and gaze data which resulted in 15 models for each time period. We included random intercepts for participants, whereas scenario was considered a fixed factor.

For all considered NASA-TLX items we found a significant difference between scenarios, whereas the Train Crash scenario was perceived as more demanding than scenario Fire regarding mental demand, time demand and effort, see Tables 6 and 7.

Table 6 Initial burst: subjective cognitive load ratings after finishing the level ~ scenario and eye-tracking features

Full size table

Table 7 Initial idle: subjective cognitive load ratings after finishing the level ~ scenario and eye-tracking features

Full size table

4.3.1 Initial burst

In addition to the effect of scenario during the initial burst period we found a significant negative effect of microsaccades on all three items: mental demand, time demand, and effort. That is, participant who reported to be more challenged in terms of mental demand, time demand and effort exhibited significantly less microsaccades than participants who rated their cognitive load lower, see Tables 6 and 7.

4.3.2 Initial idle

We found significant negative effect of microsaccades on time demand. That is, participants who reported to be more challenged regarding time demand and effort did significantly less microsaccades during initial idle phase compared to less challenged participants. We found no significant effect regarding mental demand, see Table 7.

4.4 Additional analyses

We hypothesized that more successful players should experience lower cognitive load, which should be reflected in their gaze behaviour–in particular, in lower ratings of subjective cognitive load among more successful participants. To evaluate this assumption, we fitted three linear mixed-effect models for each of the investigated NASA-TLX items to predict subjective ratings of cognitive load from performance. We included scenario and performance as fixed factors and participants as random intercepts in the models.

The association with scenario was significantly positive and the association with performance was significantly negative for all NASA-TLX items, see Table 8. That is, subjective ratings of cognitive load were higher in a more challenging Train Crash scenario, and at the same time more successful players reported lower cognitive load.

Table 8 Additional Analyses. NASA-TLX ~ scenario and performance

Full size table

5 Discussion

Our first goal was to investigate whether participants’ cognitive load during a time-critical Emergency serious game can be reliably estimated based on gaze features collected at the beginning of a game session within initial burst and initial idle time periods. As a second goal, we aimed at deepening our understanding of what cognitive processes occur during initial burst and idle phases. To identify these time periods we used behavioral log data interpreted in the light of the TBRS model [7] in line with a recent approach by Sevcenko et al. [8].

In the following we will first describe in detail how the presented approach can be used for prediction of task difficulty, operators’ performance, and subjectively perceived cognitive load, then we proceed to discuss which cognitive processes seem to happen during the initial idle phase and demonstrate correctness of the level construction of the used serious game. After that we present strengths and limitations of the study and briefly outline a possible direction of future research, followed by a general conclusion.

5.1 Difficulty, performance and cognitive load prediction

In general, the results of the present study substantiated our hypothesis that cognitive load might be predicted using eye-tracking features collected during time intervals related to initial TADD metric. Indeed, we found significant associations between gaze features during the indicated time periods and difficulty, performance, and cognitive load, although some of these associations were unexpected.

In line with our expectations, we found significant associations between performance and gaze behavior during initial idle: successful participants did significantly more fixations, saccades, and microsaccades during this time period as compared to participants who failed the level. Additionally, we found a significant negative association between microsaccadic rates and subjective ratings of cognitive load for both initial burst and initial idle time periods. Other investigated eye-tracking features showed no significant associations in this regard.

In contrast to our expectations, we found strong associations between gaze behavior, difficulty, and performance when considering the initial burst time period. Because we assumed that all participants would play at their cognitive limit during the initial burst phase, we expected no effects during this time. Although gaze features were associated with task difficulty in the expected way, the observed association with performance was in the opposite direction. For instance, we observed that participants performed significantly fewer fixations during the more challenging levels, but this association was significantly less pronounced in more successful participants. The same pattern was also evident for other gaze features. Importantly, this finding seems sensible and might indicate that more successful players experienced lower cognitive load, which is reflected in their gaze behavior. To test this assumption we conducted additional analyses (see Sect. 3.4 Additional Analyses) which showed exactly the same pattern of subjectively reported cognitive load ratings and thus further supported this account. As such, contrary to our expectations, the initial burst might be well suited to determine task difficulty and to-be-expected performance. One possible explanation for this finding is that time pressure, induced by the Emergency serious game, was not high enough to make all participants work on their cognitive limits. In this case, initial burst might be well suitable for measuring cognitive load in situations with low to medium time pressure, while for measuring cognitive load under high time pressure other options need to be identified. This hypothesis requires further investigation.

Taken together, our results support the idea that cognitive load and performance during HCI can be successfully captured based on gaze data collected during relatively narrow time windows, the latter derived by a theory-driven approach based on the TBRS model [7].

5.2 Cognitive processes during initial idle

Our second goal was to better understand what cognitive processes occur during initial idle, because no loggable in-game actions happen during this period. As expected, we found that successful participants performed significantly more fixations, saccades, and microsaccades during this time as compared to participants who failed the level. Based on this finding and previous evidence [8] it seems that more successful participants tend to use the initial idle time for more intensive visual exploration and monitoring of the game scene [58, 61, 73].

5.3 Manipulation check

Last but not least, it is worth noting that our results substantiated that levels of difficulty within the Emergency serious game were well constructed and effectively induced different levels of cognitive load. As expected, participants reported the Train Crash scenario to be cognitively more demanding than the Fire scenario. Likewise, the difficulty score which was calculated for each game level representing the percentage of participants who failed a level indicated that level difficulty increased as expected during the respective scenarios. Furthermore, we found significant negative associations between level of difficulty and fixation as well as saccadic frequency, suggesting that game levels differed in non-visual cognitive demand components such as strategic planning [56, 57, 60].

5.4 Strengths and limitations

Analytical approaches to HCIs are often based on data-driven probabilistic performance evaluations [85,86,87], which sometimes seem insufficient for estimating operators’ cognitive and emotional states. In such cases, (neuro-) physiological methods [for review see:88] seem advantageous, although often relatively complex to acquire and evaluate. Another peculiarity of physiological methods is that most of them require physical contact with the operator and therefore can hardly be used for online commercial developments. Based on these considerations, we employed the eye-tracking method in this study, because modern video-based eye-tracking systems may not require physical contact and thus can be used in a very subtle way [for an overview see: 52].

Nevertheless, we think that the main strength of our approach is represented by our theoretically informed top-down development relying on an appropriate theoretical framework (in this case TBRS by [7] was used). In this way, we hope to foster generalizability of this method to similar situations, which might be less feasible in pure data-driven approaches. For this purpose, we used the TBRS model [7], which specifically emphasizes the role of time pressure in inducing cognitive load and therefore seems particularly suitable for predicting cognitive load during time-critical HCI. Moreover, the proposed method allows for early prediction of operator performance and can therefore be used for the development of interactive adaptive HCIs.

However, our approach is not free of limitations. First, the proposed method applies only to a relatively narrow family of situations or tasks with certain characteristics. Further research is needed to determine whether this approach can be applied or easily adapted to other contexts, e.g., interactions without time pressure. Second, as mentioned above, it is not clear whether different degrees of time pressure during HCI must be considered when using this method. Third, the recording frequency of the eye-tracker used for the study was 250 Hz, which might represent a technical limitation for instance to detect microsaccades.

5.5 Conclusion

In this paper, we presented a novel theory-driven approach considering specific eye-tracking features to predict cognitive load during time-critical resource-managing situations in combination with TBRS theory. Eye-tracking data was collected during relatively narrow time windows at the beginning of the interaction with the simulation serious game and thus can be potentially used for real-time adaptation of human–computer interactions. Moreover, the detection of the time periods was based on log data and can be easily run in the background. Obtained results supported the proposed approach, eye-tracking features collected during the initial burst appeared to be well suited to predict performance and task difficulty. Fixations frequency, saccadic rate, and pupil diameter seem to be well suited to predict task difficulty during the initial burst phase. Fixation rate was the best indicator to predict performance during the initial burst. If an estimate of subjectively perceived cognitive load is required, the microsaccadic rate recorded during the initial action block might be a good option. These results illustrate how theoretic knowledge about the task structure may be used advantageously for the assessment of cognitive load. Although requiring further investigation in terms of reliability and generalizability, the presented approach seems promising for measuring cognitive load in realistic time-critical HCI, considering adaptation to operators’ mental needs.

Data Availability

Not applicable.

Code availability

Not applicable.

Notes

Both difficulty score and scenario reflect the same game characteristic. Here, the scenario represents a rough division of six game levels into two difficulty classes, while the difficulty score sorts these six levels and represents a finely graded difficulty scale. Because these metrics do not have the same numerical order (Scenario 1 includes difficulty levels 2, 35, and 64, while Scenario 2 includes difficulty levels 4, 52, and 71), it is not convenient to include them both simultaneously in a same mixed model analysis. In this case, we were interested in investigating whether eye-tracking features are able to discriminate between slighter variations in difficulty, so difficulty score was included in the analyses.
From the previous analyses, we already know that eye-tracking features recording during initial burst phase are able to respond to difficulty variations. In this next step, we aim to discover whether it is possible to distinguish between winners and losers within the same difficulty range. To get a clearer picture of the effect and to account for the real situation in which we are usually not able to make a reliable and fine-grained scenario classification, we included the scenario as an additional fixed effect in all further analyses.

References

Babiloni F (2019) Mental workload monitoring: new perspectives from neuroscience. In: Longo L, Leva M (eds) Human mental workload: models and applications. H-WORKLOAD 2019. Communications in Computer and Information Science, vol 1107. Cham: Springer, pp. 3-19
Gerjets P, Walter C, Rosenstiel W, Bogdan M, Zander TO (2014) Cognitive state monitoring and the design of adaptive instruction in digital environments: lessons learned from cognitive workload assessment using a passive brain-computer interface approach. Front Neurosci 8:385
Article Google Scholar
Charabati S, Bracco D, Mathieu P, Hemmerling T (2009) Comparison of four different display designs of a novel anaesthetic monitoring system, the ‘integrated monitor of anaesthesia (IMA™).’ Br J Anaesth 103(5):670–677
Article Google Scholar
Oviatt S, (2006) Human-centered design meets cognitive load theory: designing interfaces that help people think. In: Proceedings of the 14th ACM international conference on multimedia, pp. 871–880
Walter C, Rosenstiel W, Bogdan M, Gerjets P, Spüler M (2017) Online EEG-based workload adaptation of an arithmetic learning environment. Front Human Neurosci 11:286. https://doi.org/10.3389/fnhum.2017.00286
Article Google Scholar
Nebel S, Ninaus M (2019) New perspectives on game-based assessment with process data and physiological signals. In: Ifenthaler D, Kim YJ (eds) Game-Based Assessment Revisited. Springer International Publishing, Cham, pp 141–161. https://doi.org/10.1007/978-3-030-15569-8_8
Chapter Google Scholar
Barrouillet P, Bernardin S, Camos V (2004) Time constraints and resource sharing in adults’ working memory spans. J Exp Psychol Gen 133(1):83
Article Google Scholar
Sevcenko N, Ninaus M, Wortha F, Moeller K, Gerjets P (2021) Measuring cognitive load using in-game metrics of a serious simulation game. Front Psychol 12:906. https://doi.org/10.3389/fpsyg.2021.572437
Article Google Scholar
Miller GA (1956) The magical number seven, plus or minus two: some limits on our capacity for processing information. Psychol Rev 63(2):81
Article Google Scholar
Ayaz H, Shewokis PA, Bunce S, Izzetoglu K, Willems B, Onaral B (2012) Optical brain monitoring for operator training and mental workload assessment. Neuroimage 59(1):36–47
Article Google Scholar
Hancock G, Hancock P, Janelle C (2012) The impact of emotions and predominant emotion regulation technique on driving performance. Work 41(Supplement 1):3608–3611
Article Google Scholar
Fan J, Smith AP (2017) The impact of workload and fatigue on performance. In: Longo L. and Leva M. (eds.) Human mental workload: models and applications. H-WORKLOAD 2017. Communications in computer and information science, vol. 726, Cham: Springer, pp. 90-105
Hancock P (1989) The effect of performance failure and task demand on the perception of mental workload. Appl Ergon 20(3):197–205
Article Google Scholar
Prabhakar G, Mukhopadhyay A, Murthy L, Modiksha M, Sachin D, Biswas P (2020) Cognitive load estimation using ocular parameters in automotive. Transportation Engineering 2:100008
Article Google Scholar
Tokuda S, Obinata G, Palmer E, Chaparro A, (2011) Estimation of mental workload using saccadic eye movements in a free-viewing task. In: 2011 Annual international conference of the IEEE engineering in medicine and biology society, IEEE, pp. 4523–4529
Babu MD, JeevithaShree D, Prabhakar G, Saluja KPS, Pashilkar A, Biswas P (2019) Estimating pilots’ cognitive load from ocular parameters through simulation and in-flight studies. J Eye Movement Res. https://doi.org/10.16910/jemr.12.3.3
Article Google Scholar
Aasted CM et al (2015) Anatomical guidance for functional near-infrared spectroscopy: atlasviewer tutorial. Neurophotonics 2(2):020801
Article Google Scholar
Smith-Jackson TL, Klein KW (2009) Open-plan offices: task performance and mental workload. J Environ Psychol 29(2):279–289
Article Google Scholar
Yurko YY, Scerbo MW, Prabhu AS, Acker CE, Stefanidis D (2010) Higher mental workload is associated with poorer laparoscopic performance as measured by the NASA-TLX tool. Simulation Healthcare 5(5):267–271. https://doi.org/10.1097/SIH.0b013e3181e3f329
Article Google Scholar
Yerkes RM, Dodson JD (1908) The relation of strength of stimulus to rapidity of habit-formation. J Comp Neurol Psychol 18(5):459–482
Article Google Scholar
Csikszentmihalyi M, Csikszentmihalyi I, Graef R, Holcomb JH, Hendin J, MacAloon J (1975) Eds. Beyond boredom and anxiety, 1 ed. (Behavioral Science Series). San Francisco, London: Jossey-Bass Publishers
Kiili K, Lindstedt A, Ninaus M, (2018) Exploring characteristics of students' emotions, flow and motivation in a math game competition. In: GamiFIN Conference, Pori, Finland, May 21–23, 2018, pp. 20–29
Anderson KJ (1994) Impulsitivity, caffeine, and task difficulty: a within-subjects test of the Yerkes-Dodson law. Personality Individ Differ 16(6):813–829
Article Google Scholar
Montani F, Vandenberghe C, Khedhaouria A, Courcy F (2020) Examining the inverted U-shaped relationship between workload and innovative work behavior: The role of work engagement and mindfulness. Human Relations 73(1):59–93
Article Google Scholar
Cummings ML and Nehme CE (2009) Modeling the impact of workload in network centric supervisory control settings. In: presented at the 2nd annual sustaining performance under stress symposium, 25 February
Orru G, Longo L (2019) The evolution of cognitive load theory and the measurement of its intrinsic, extraneous and Germane loads: a review. In: Longo L. and Leva M. (eds.) Human mental workload: models and applications. H-WORKLOAD, 2018 Communications in computer and information science, Cham: Springer, pp. 23–48
Michon JA (1993) Generic intelligent driver support. CRC Press, UK
Google Scholar
Piechulla W, Mayser C, Gehrke H, König W (2003) Reducing drivers’ mental workload by means of an adaptive man–machine interface. Transport Res F: Traffic Psychol Behav 6(4):233–248
Article Google Scholar
Riener A, Noldi J, (2015) Cognitive load estimation in the car: practical experience from lab and on-road tests. In: Adjunct proceedings of automotive UI 2015, workshop practical experiences in measuring and modeling drivers and driver-vehicle interactions, 2015, pp. 4
Wu C, Tsimhoni O, Liu Y (2008) Development of an adaptive workload management system using the queueing network-model human processor (QN-MHP). IEEE Trans Intell Transp Syst 9(3):463–475
Article Google Scholar
Lin B, Wu C (2010) Mathematical modeling of the human cognitive system in two serial processing stages with its applications in adaptive workload-management systems. IEEE Trans Intell Transp Syst 12(1):221–231
Article Google Scholar
Kohlmorgen J et al., (2007) Improving human performance in a real operating environment through real-time mental workload detection. In: Dornhege G, Millan JDR, Hinterverger T, McFarland DJ, and Müller K-R. (eds.) Toward Brain-computer interfacing, vol. 409422,. Cambridge, Massachusetts, London, England: MIT Press, 2007, ch. 24, pp. 409–422
Wilson GF, Purvis B, Skelly J, Fullenkamp P, Davis I, (1987) Physiological data used to measure pilot workload in actual flight and simulator conditions. In: Proceedings of the human factors society annual meeting, 1987, vol. 31, no. 7: SAGE Publications Sage CA: Los Angeles, CA, pp. 779–783
Veltman J, Gaillard A (1996) Physiological indices of workload in a simulated flight task. Biol Psychol 42(3):323–342
Article Google Scholar
Strenzke R, Uhrmann J, Benzler A, Maiwald F, Rauschert A, Schulte A, (2011) Managing cockpit crew excess task load in military manned-unmanned teaming missions by dual-mode cognitive automation approaches. In: AIAA guidance, navigation, and control conference, 2011, pp. 6237
Roth G, Schulte A, Schmitt F, Brand Y (2019) Transparency for a Workload-adaptive cognitive agent in a manned-unmanned teaming application. IEEE Trans Human-Machine Syst 50(3):225–233
Article Google Scholar
Aricò P et al (2016) Adaptive automation triggered by EEG-based mental workload index: a passive brain-computer interface application in realistic air traffic control environment. Front Hum Neurosci 10:539
Article Google Scholar
Wilson GF, Russell CA (2007) Performance enhancement in an uninhabited air vehicle task using psychophysiologically determined adaptive aiding. Hum Factors 49(6):1005–1018
Article Google Scholar
Sarkar P, Ross K, Ruberto AJ, Rodenbura D, Hungler P, Etemad A, (2019) Classification of cognitive load and expertise for adaptive simulation using deep multitask learning. In: 2019 8th International conference on affective computing and intelligent interaction (ACII), 2019: IEEE, pp. 1–7
Mirbabaie M, Fromm J (2019) Reducing the cognitive load of decision-makers in emergency management through augmented reality
Yuksel BF et al. (2016) Learn piano with BACh: an adaptive learning interface that adjusts task difficulty based on brain state. In: Proceedings of the 2016 CHI conference on human factors in computing systems, 2016, pp. 5372–5384
Brünken R, Seufert T, Paas F, (2010) Measuring cognitive load
Eggemeier FT, Wilson GF, Kramer AF, Damos DL (1991) Workload assessment in multi-task environments. In: Damos DL (ed) Multiple-task performance. Taylor & Francis, London, Washington, DC, pp 207–216
Google Scholar
Johannsen G (1979), Workload and workload measurement. In: Mental Workload, vol. 8, N. Moray Ed., (NATO Conference. Boston: Springer, 1979, pp. 3–11
Reid GB, Nygren TE (1988) The subjective workload assessment technique: a scaling procedure for measuring mental workload. In: Advances in psychology, vol. 52: Elsevier, 1988, pp. 185–218
Hart SG, Staveland LE (1988) Development of NASA-TLX (Task Load Index): results of empirical and theoretical research. In: Advances in psychology, vol. 52: Elsevier, 1988, pp. 139–183
Azcarraga J, Suarez MT (2013) Recognizing student emotions using brainwaves and mouse behavior data. Int J Distance Educ Technol (IJDET) 11(2):1–15
Article Google Scholar
Haapalainen E, Kim S, Forlizzi JF, Dey AK (2010) Psycho-physiological measures for assessing cognitive load. In: Presented at the proceedings of the 12th ACM international conference on Ubiquitous computing, Copenhagen, Denmark, 2010. [Online]. Available: https://doi.org/10.1145/1864349.1864395
FakhrHosseini SM, Jeon M (2019) How do angry drivers respond to emotional music? A comprehensive perspective on assessing emotion. J Multimodal User Interfaces 13(2):137–150. https://doi.org/10.1007/s12193-019-00300-3
Article Google Scholar
Liu R, Walker E, Friedman L, Arrington CM (2020) fNIRS-based classification of mind-wandering with personalized window selection for multimodal learning interfaces. J Multimodal User Interfaces 15:257–272. https://doi.org/10.1007/s12193-020-00325-z
Article Google Scholar
Ninaus M et al (2014) Neurophysiological methods for monitoring brain activity in serious games and virtual environments: a review. Int J Technol Enhanced Learn 6(1):78–103. https://doi.org/10.1504/IJTEL.2014.060022
Article Google Scholar
Hutton SB (2019) Eye tracking methodology. In: Klein C, Ettinger U (eds) Eye Movement research: an introduction to its scientific foundations and applications. Springer International Publishing, Cham, pp 277–308
Chapter Google Scholar
Rayner K (1998) Eye movements in reading and information processing: 20 years of research. Psychol Bull 124(3):372
Article Google Scholar
Pouget P (2019) Introduction to the study of eye movements. In: Klein C, Ettinger U (eds) Eye movement research: an introduction to its scientific foundations and applications. Springer International Publishing, Cham, pp 3–10
Chapter Google Scholar
Clifton C Jr et al (2016) Eye movements in reading and information processing: Keith Rayner’s 40 year legacy. J Mem Lang 86:1–19
Article Google Scholar
Chen S, Epps J, Ruiz N, Chen F (2011) Eye activity as a measure of human mental effort in HCI. In: Proceedings of the 16th international conference on intelligent user interfaces, 2011, pp. 315–318
De Rivecourt M, Kuperus M, Post W, Mulder L (2008) Cardiovascular and eye activity measures as indices for momentary changes in mental effort during simulated flight. Ergonomics 51(9):1295–1319. https://doi.org/10.1080/00140130802120267
Article Google Scholar
Van Orden KF, Limbert W, Makeig S, Jung T-P (2001) Eye activity correlates of workload during a visuospatial memory task. Hum Factors 43(1):111–121. https://doi.org/10.1518/001872001775992570
Article Google Scholar
Goldberg JH, Kotval XP (1999) Computer interface evaluation using eye movements: methods and constructs. Int J Ind Ergon 24(6):631–645
Article Google Scholar
Nakayama M, Takahashi K, Shimizu Y (2002) The act of task difficulty and eye-movement frequency for the'Oculo-motor indices. In: Proceedings of the 2002 symposium on Eye tracking research & applications: ACM Digital Library, 2002, pp. 37–42
Benedetto S, Pedrotti M, Bridgeman B (2011) Microsaccades and exploratory saccades in a naturalistic environment. J Eye Mov Res. https://doi.org/10.16910/jemr.4.2.2
Article Google Scholar
Gao X, Yan H, Sun H-J (2015) Modulation of microsaccade rate by task difficulty revealed through between-and within-trial comparisons. J Vis 15(3):3–3. https://doi.org/10.1167/15.3.3
Article Google Scholar
Siegenthaler E et al (2014) Task difficulty in mental arithmetic affects microsaccadic rates and magnitudes. Eur J Neurosci 39(2):287–294. https://doi.org/10.1111/ejn.12395
Article Google Scholar
Alexander RG, Martinez-Conde S (2019) Fixational Eye Movements. In: Klein C, Ettinger U (eds) Eye Movement Research: an Introduction to its Scientific Foundations and Applications. Springer International Publishing, Cham, pp 73–115
Chapter Google Scholar
Siegle GJ, Ichikawa N, Steinhauer S (2008) Blink before and after you think: Blinks occur prior to and following cognitive load indexed by pupillary responses. Psychophysiology 45(5):679–687. https://doi.org/10.1111/j.1469-8986.2008.00681.x
Article Google Scholar
Andreassi JL (2013) Psychophysiology: Human behavior & physiological response, 4th edn. Lawrence Erlbaub Associates, USA
Book Google Scholar
Fukuda K, Stern JA, Brown TB, Russo MB (2005) Cognition, blinks, eye-movements, and pupillary movements during performance of a running memory task. Aviation, Space, Environ Med 76(7):C75–C85
Google Scholar
Klingner J, Tversky B, Hanrahan P (2011) Effects of visual and verbal presentation on cognitive load in vigilance, memory, and arithmetic tasks. Psychophysiology 48(3):323–332. https://doi.org/10.1111/j.1469-8986.2010.01069.x
Article Google Scholar
Murray RB, Adler MW, Korczyn AD (1983) The pupillary effects of oploids. Life Sci 33(6):495–509
Article Google Scholar
Richman JE, McAndrew KG, Decker D, Mullaney SC (2004) An evaluation of pupil size standards used by police officers for detecting drug impairment. Optometry-J Am Optom Assoc 75(3):175–182
Article Google Scholar
Hess EH (1965) Attitude and pupil size. Sci Am 212(4):46–55
Article Google Scholar
Hess EH, Polt JM (1960) Pupil size as related to interest value of visual stimuli. Science 132(3423):349–350
Article Google Scholar
He X, Wang L, Gao X, Chen Y. (2012) The eye activity measurement of mental workload based on basic flight task. In: IEEE 10th international conference on industrial informatics, 2012: IEEE, pp. 502–507
Appel T, et al. (2019) Predicting Cognitive load in an emergency simulation based on behavioral and physiological measures. In: 2019 International conference on multimodal interaction, W. Gao et al. Eds. New York United States: Association for Computing Machinery, 2019, pp. 154–163
Promotion Software GmbH. "World of Emergency." Promotion Software GmbH. https://www.world-of-emergency.com/?lang=en (accessed August 26, 2019, 2019)
Hart SG (2006) ASA-task load index (NASA-TLX); 20 years later. In: Proceedings of the human factors and ergonomics society annual meeting, vol. 50, no. 9). Los Angeles, CA: Sage Publications CA, 2006, pp. 904–908
Haerle SK, Daly MJ, Chan HH, Vescan A, Kucharczyk W, Irish JC (2013) Virtual surgical planning in endoscopic skull base surgery. Laryngoscope 123(12):2935–2939
Article Google Scholar
Temple JG, Dember WN, Warm JS, Jones KS, LaGrange CM, (1997) The effects of caffeine on performance and stress in an abbreviated vigilance task. In: Proceedings of the human factors and ergonomics society annual meeting, 1997, vol. 41, no. 2: SAGE Publications Sage CA: Los Angeles, CA, pp. 1293–1297
Kret ME, Sjak-Shie EE (2019) Preprocessing pupil size data: guidelines and code. Behav Res Methods 51(3):1336–1342
Article Google Scholar
Mathôt S, Fabius J, Van Heusden E, Van der Stigchel S (2018) Safe and sensible preprocessing and baseline correction of pupil-size data. Behav Res Methods 50(1):94–106
Article Google Scholar
Krejtz K, Duchowski AT, Niedzielska A, Biele C, Krejtz I (2018) Eye tracking cognitive load using pupil diameter and microsaccades with fixed gaze. PLoS ONE 13(9):e0203629
Article Google Scholar
R Core Team, "R: A Language and Environment for Statistical Computing," 2020. [Online]. Available: https://www.R-project.org/
Bates D, Mächler M, Bolker B, Walker S (2014) Fitting linear mixed-effects models using lme4. arXiv preprint arXiv:1406.5823
Makowski D, Lüdecke D, Ben-Schachar M (2020) Automated reporting as a practical tool to improve reproducibility and methodological best practices adoption. J Open Source Softw 5:2815
Google Scholar
Magerko B, Stensrud BS, Holt LS, (2006) Bringing the schoolhouse inside the box-a tool for engaging, individualized training. SOAR TECHNOLOGY INC ANN ARBOR MI, 01.11.2006 2006. Accessed: 12.03.2021. [Online]. Available: https://apps.dtic.mil/sti/pdfs/ADA481593.pdf
Spronck P, Ponsen M, Sprinkhuizen-Kuyper I, Postma E (2006) Adaptive game AI with dynamic scripting. Mach Learn 63(3):217–248
Article Google Scholar
Zook AE, Riedl MO, (2012) A temporal data-driven player model for dynamic difficulty adjustment. In: 8th Artificial intelligence and interactive digital entertainment conference
Kivikangas JM et al (2011) A review of the use of psychophysiological methods in game research. J Gaming Virtual Worlds 3(3):181–199
Article Google Scholar

Download references

Funding

Open Access funding enabled and organized by Projekt DEAL. University of Tuebingen/IWM.

Author information

Authors and Affiliations

Daimler Truck AG, Stuttgart, Germany
Natalia Sevcenko
Faculty of Science, Eberhard Karl University, Tuebingen, Germany
Natalia Sevcenko
Faculty of Economics and Social Sciences, Hector Research Institute of Education Sciences and Psychology, Tuebingen, Germany
Tobias Appel
Institute of Psychology, University of Graz, Graz, Austria
Manuel Ninaus
Centre for Mathematical Cognition, School of Science, Loughborough University, Loughborough, UK
Korbinian Moeller
Leibniz-Institut fuer Wissensmedien, Tuebingen, Germany
Korbinian Moeller & Peter Gerjets
LEAD Graduate School & Research Network, University of Tuebingen, Tuebingen, Germany
Manuel Ninaus, Korbinian Moeller & Peter Gerjets

Authors

Natalia Sevcenko
View author publications
You can also search for this author in PubMed Google Scholar
Tobias Appel
View author publications
You can also search for this author in PubMed Google Scholar
Manuel Ninaus
View author publications
You can also search for this author in PubMed Google Scholar
Korbinian Moeller
View author publications
You can also search for this author in PubMed Google Scholar
Peter Gerjets
View author publications
You can also search for this author in PubMed Google Scholar

Contributions

All authors contributed to the conception of the work. NS, MN, KM and PG conceptualized and designed the study, NS conducted the study. TA preprocessed data. NS conducted the statistical analyses. NS wrote the first draft of the manuscript which was edited in several rounds with TA, and MN. KM and PG provided the last rounds of edits on the manuscript. All authors revised the final manuscript.

Corresponding author

Correspondence to Natalia Sevcenko.

Ethics declarations

Conflicts of interest

The authors declare a conflict of interest: “Author Natalia Sevcenko was employed by the company Daimler Truck AG. The remaining authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.”

Ethical approval

Study was approved by IWM Ethics Committee.

Consent to participate

All participants signed written informed consent prior to the experiment.

Consent for publication

All authors give their consent for publication.

Additional information

Publisher's Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/.

Reprints and permissions

About this article

Cite this article

Sevcenko, N., Appel, T., Ninaus, M. et al. Theory-based approach for assessing cognitive load during time-critical resource-managing human–computer interactions: an eye-tracking study. J Multimodal User Interfaces 17, 1–19 (2023). https://doi.org/10.1007/s12193-022-00398-y

Download citation

Received: 25 May 2021
Accepted: 04 October 2022
Published: 28 November 2022
Issue Date: March 2023
DOI: https://doi.org/10.1007/s12193-022-00398-y

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Theory-based approach for assessing cognitive load during time-critical resource-managing human–computer interactions: an eye-tracking study

Abstract

Similar content being viewed by others

Development of a Quantitative Evaluation Tool of Cognitive Workload in Field Studies Through Eye Tracking

Insights from Eye Movement into Dynamic Decision-Making Research and Usability Testing

Cognitive Workload of Humans Using Artificial Intelligence Systems: Towards Objective Measurement Applying Eye-Tracking Technology

1 Introduction

1.1 Cognitive load

1.2 Eye-tracking

1.2.1 Fixations

1.2.2 Saccades

1.2.3 Microsaccades

1.2.4 Blinks

1.2.5 Pupil dilation

2 Summary

2.1 Time-based resource-sharing model

2.1.1 Temporal action density decay metric

2.2 Present study

3 Materials and methods

3.1 Participants

3.2 Task

3.3 Experimental setup and design

3.4 Features

3.4.1 Difficulty score and performance

3.4.2 Subjective rating of cognitive load

3.4.3 Eye-tracking features

3.4.4 Analyzed time periods

3.5 Statistical analysis

4 Results

4.1 Difficulty score

4.1.1 Initial burst

4.1.2 Initial idle

4.2 Performance

4.2.1 Initial burst

4.2.2 Initial idle

4.3 Subjective assessment of cognitive load

4.3.1 Initial burst

4.3.2 Initial idle

4.4 Additional analyses

5 Discussion

5.1 Difficulty, performance and cognitive load prediction

5.2 Cognitive processes during initial idle

5.3 Manipulation check

5.4 Strengths and limitations

5.5 Conclusion

Data Availability

Code availability

Notes

References

Funding

Author information

Authors and Affiliations

Contributions

Corresponding author

Ethics declarations

Conflicts of interest

Ethical approval

Consent to participate

Consent for publication

Additional information

Publisher's Note

Rights and permissions

About this article

Cite this article

Share this article

Keywords

Search

Navigation