1 Introduction

Human–Computer Interaction (HCI) is a scientific field that studies how people interact with technology [1]. This field focuses on the design, development, and evaluation of computer systems and technologies, and its main objective is to propose methods to create interfaces and systems that are easy to use, efficient and effective [2]. On the other hand, Educational Computing (EC) refers to the use of technology, such as computers, software and digital resources, in educational processes [3]. In educational computing, technology is used to facilitate teaching and learning, increase student engagement and motivation, personalize the learning experience, and provide access to resources and information beyond traditional classroom materials. EC can take many forms, such as online learning environments, multimedia resources, and educational games, among others. HCI and EC researchers and practitioners attempt to understand the cognitive and physical processes involved in interaction and learning processes and use that knowledge to design interfaces and systems tailored to the needs, preferences, capabilities, and limitations of users and learners.

Human factors play a crucial role in HCI and EC. Human factors refer to the physical, cognitive, and social factors that affect how humans interact with technology [4]. The consideration of human factors in the design of interfaces can affect, for example, the layout, the use of certain feedback mechanisms, or the selection of different controls or formats for input and output of information. This is why, in the design of interactive systems, it is necessary to take into account aspects related to basic psychological principles, such as perception, attention or memory [5]. Human factors research in EC seeks to understand how students interact with learning materials and environments as well as the unique characteristics of learners that can impact their learning experiences, such as personality traits, cognitive abilities, learning styles, and physical abilities.

On the other hand, considering human factors allows one to support the diversity and individual differences of potential users and learners, improving the accessibility and customization of the systems. In this sense, the fields of HCI and EC have a responsibility to consider the diversity of users in the design and implementation of technologies, learning environments and instructional materials. Thus, for example, user interfaces must be designed to accommodate users with different types of disabilities, for which assistive technologies must be designed. For example, instructional materials can be designed to incorporate multiple modalities, such as visual and auditory learning. Interfaces must also be created to be used by people with different cognitive abilities and learning styles. The ultimate goal is to provide interactive and personalized learning experiences to improve engagement, motivation or learning outcomes.

A technique, widely used in psychology, that allows to know the individual behaviors of users and learners is eye tracking [6, 7]. The usefulness of the eye tracking technique is based on the hypothesis that there is a link between visual scanning behavior and the cognitive activity that the subject performs (the so-called “eye-mind hypothesis”, which suggests that if the user looks at something, it is because he is thinking about it). Although this relationship is not always true (we do not always attend to what we are looking at) nor immediate, it is a sufficiently consistent relationship to draw objective conclusions about the cognitive processes that originate or trigger the so-called fixationsFootnote 1 (gaze stabilization points) and the visual exploration behaviors of subjects [8].

The use of eye tracking will therefore allow to answer research questions that can be formulated in the following terms:

  • What do users or learners look at? What do they look at or ignore when they look at a user interface or educational material?

  • What catches their attention? What distracts them?

  • How do users or learners explore and examine the information displayed on screen? What is the order of exploration they follow?

  • Given a certain configuration (interface or instructional material), are the subjects able to find a certain target in it? How long does it take them to locate it? How long does it take them to recognize it?

  • Given two different designs or configurations, do they take longer to locate and identify a given target in one design or the other? Therefore, which design allows them to locate the visual information of interest more efficiently?

  • Given two different designs or configurations, which one implies a greater visual effort or imposes a greater cognitive load on the user?

The use of eye tracking has become widespread in HCI, as it allows understanding how users interact with interfaces, providing valuable information on how people process information, navigate interfaces, and interact with visual stimuli. This technique has been successfully used in the evaluation of the usability of interactive systems [5, 9] and websites [10, 11], and has allowed to empirically evaluate visual phenomena such as the Von Restorff effectFootnote 2 or the one known as banner blindnessFootnote 3 [12]. In education, eye tracking has been used for a variety of purposes, such as the analysis of reading behavior [13,14,15], attention and distraction issues [16, 17], learning styles [18,19,20,21], as well as the use of different formats of instructional materials and their relationship to learning outcomes [22,23,24,25].

The authors of this paper have extensive experience in the use of the eye tracking technique for the evaluation of interactive systems and educational resourcesFootnote 4. However, when applying it, it has been lacking the existence of methodological proposals or a compilation of good practices to follow in this type of experiences. Therefore, the aim of this paper is to propose a tool set of recommendations and guidelines, as well as a methodological approach, based on and validated by our results and previous experience, which aims to be a useful tool for those researchers who want to use this assessment technique in their works.

This paper is organized as follows: Sect. 2 provides a brief introduction to the eye tracking technique; Sect. 3 describes some of the main studies carried out by our research team applying this technique in the fields of HCI and Educational Computing; Sect. 4 describes a proposal of objective and subjective metrics that can be gathered in these study fields; Sect. 5 discusses the main factors and guidelines to be considered in this type of experiment; and, in Sect. 6, a methodological approach or sequence of steps to follow when using it. Finally, Sect. 7 presents a series of conclusions of the present work and the lines of continuation of this.

2 Eye tracking foundations and main applications areas

This section describes what eye tracking is, what this technique consists of, and the areas in which it has been most exploited. Next, an introduction to the type of information obtained in an eye tracking session and how it is interpreted is provided.

2.1 What is eye tracking?

The concept of eye tracking refers to a set of technologies that allow monitoring the way in which a person looks at a given image or scene. In particular, it makes it possible to record in which areas he or she fixes more attention, for how long, and in the order in which he or she explores the visualized scene. The function of eye tracker devices is to determine and record where the person directs his or her focus of vision and, therefore, which area of the scene is perceived more clearly at a given moment.

2.2 Application areas

Eye tracking has been used in various areas to analyze and understand human behavior and cognitive processes. This technique has been used extensively in psychology and neuroscience to study cognitive processes such as attention, perception, memory, and decision-making [26,27,28]. In the field of marketing and advertising, eye tracking has been used to study consumer behavior [29], evaluate product packaging and design, and develop effective advertising campaigns [30,31,32]. In the medical and health field [33] has been used to study eye movements in patients with neurological and ophthalmological disorders [34], such as autism [35,36,37], Parkinson disease [38,39,40], and glaucoma [41, 42]; as well as for the diagnosis [43,44,45,46], tracking the progression of diseases, and monitoring the effects of treatments in various medical conditions [47, 48]. It has also been used in fields as diverse as safe driving [49] and, in particular, for the distractions [50] and fatigue detection [51,52,53], tourism [54], sports [55], aviation [56], manufacturing and logistics [57] or cartography [58], among others. Also, in engineering, eye tracking has been used in several ways: to detect trace links to reduce manual effort in construction tasks [59], to try to explain why different information formats affect working memory development and retrieval in industrial settings [60], or to investigate construction workers’ visual attention allocation during hazard recognition [61]. In these areas, eye tracking technologies allow the capture of in real-time visual scanning behaviors related to different human cognitive, emotional, and physiological states. Being aware of these states is especially useful in high-risk industries such as aviation, maritime and construction [62]. However, it is a fact that only a small number of eye tracking applications have been used in industrial settings like engineering design, production, or assembly [63].

Where eye tracking has been widely used is in the field of engineering sciences education [64], especially to reveal the cognitive processes and strategies of engineering students for spatial problem-solving [65]. This technique has also been the subject of interest in the field of software engineering [66], in order to analyze the visual behaviors of the programmer when trying to read [67], comprehend [68,69,70], debug [71] or detect errors in the code of a program [72, 73], study code traceability [74], and collaborative programming [66, 75,76,77]. It has also been used to identify different analysis patterns according to programmer profiles, such as, for example, expertise [78,79,80,81], age [82], or gender [83,84,85], among others.

In some cases, the eye tracking technique is not only used to support “a posteriori evaluation processes but also to provide feedback in successive refinement or instructional scaffolding approaches. Thus, for example, in [86], the eye tracker was used to instantly identify students’ difficulties in a C programming course and provide them with tips. The authors demonstrated that employing this smart eye tracking feedback scaffolding approach significantly improved students’ self-efficacy. With a similar approach, [87] describes a course for pre-service teachers (advanced chemistry student teachers) who were trained to design, choose, and use appropriate digital multimedia in their future work as teachers. In this course, the trainees created learning materials following the principles of multimedia design and evaluated them using eye tracking. This allowed them to reflect on the quality of the learning materials they created and to better understand the perceptions and cognitive processes of their future students. This work demonstrated that the feedback provided by the device in the context of this course influenced the attitude, subjective norm, and self-efficacy toward the use of digital media in teaching of the course participants.

Nowadays, there is a growing trend to combine this non-invasive tracking technique with other sources of biometric and physiological information, such as encephalography (EEG) or electromyography (EMG) [88]. The availability of low-cost sensors and wearable devices, such as activity bracelets, has contributed to their combined use with other sources of information [89, 90]. For example, in [91], EEG, EMG and eye tracking are combined in the fields of astronaut training and space exploration. However, the most common combination when applying these mixed methods is EEG and eye tracking. Thus, for example, there are works that combine both sources of information in fields as diverse as autism detection [44], marketing [92], emotion recognition [93], or software engineering (to analyze the program comprehension or the effect of programming experience on the programmer’s efficacy [94]).

The incorporation of eye movement-based recording is also of interest in novel interaction styles and paradigms, such as Brain-Computer Interaction (BCI) or Virtual and Augmented Reality (VR/AR). Thus, for example, the combination of eye tracking and BCI has been used for robotic control [95] or to support interaction for Amyotrophic Lateral Sclerosis (ALS) patients [96]. On the other hand, the integration of eye trackers in commercial head-mounted displays (HMD) has enabled the application of eye tracking in multiple fields and disciplines, such as education and military, medical, or sports training, as well as in research in the fields of marketing, applications with therapeutic purposes, or consumer experience (CX), among others [97].

One of the areas in which the use of eye tracking has been most exploited has been in the field of HCI, to evaluate interactive systems [98, 99], mainly web pages [10, 100], and mobile applications [101,102,103]. Eye tracking has also been widely used in education [18, 104, 105] to study reading strategies and comprehension [14, 15, 106, 107], to analyze and better understand classroom behavior [108], as well as the study of multimedia educational materials, this sub-area being undoubtedly one of those that has given rise to the most prolific production of research papers in recent years [22, 23, 109].

2.3 What is obtained in an eye tracking session and how to interpret it?

When visually exploring a scene, the eyes move in rapid jumps or movements, called saccades. Between one saccade and the next, there is a fixation, or period of stabilization of the eye (which allows the focused area to be seen sharply). Fixations provide the most valuable information to be extracted and interpreted using the eye tracking technique. From their number, order, and duration, we can extract valuable conclusions about the visual scanning behavior of a subject.

To facilitate the interpretation of the large amount of data collected during an eye tracking session, two types of representations are usually used to graphically summarize the visual behavior of a user or set of users. For example, static representations of the saccadic scan path can be obtained (Fig. 1a). This representation shows the sequence of fixations (points) and saccades (lines). The size of the dots indicates the duration of the fixations, while the numbering of the dots indicates the order of the fixations. An alternative representation, especially suitable for the analysis of the visual scanning patterns of individual subjects or groups of users, are the so-called heat maps. Figure 1b shows an example of this type of representation. Heat maps highlight the areas on which the subject has focused the most attention (i.e., where the subject has looked the longest or most often). The range of colors (green to red) indicates the points on which the fixation was more or less frequent (or more or less lasting).

Fig. 1
figure 1

a Example of scan path. b Example of heat map

The appearance of the scan paths allows the identification of certain behaviors or situations quickly and visually. Thus, for example, the complexity of the scan path (duration, length, and density) may be related to a less efficient scan or search for information, while a lower density may be indicative of more direct searches. Similarly, the order of the fixations (the direction of the scan path) may allow us to identify certain patterns or strategies for searching or locating information in the image (top-down, bottom-up, etc.).

In eye tracking sessions, one of the main steps is the definition of the so-called areas of interest (AOI) of the image or user interface under evaluation. These areas of interest are for the evaluator or the design team, and they are the ones that will be analyzed more exhaustively through the calculation of certain metrics.

If we want to make a quantitative analysis of the information recorded in an eye tracking session, we can make use of a large number and variety of metrics [6, 110]. These objective measures are used to analyze aspects related to attention, usefulness, or the cognitive load or mental effort required to understand the information displayed on the entire screen or in a specific area of interest. Some of the most used metrics are based on fixation counts, fixation duration, the number of visits and revisits to an AOI, percentages of time, or the number of fixations on a given AOI relative to viewing the entire scene or image. The latter type of measure allows the identification of the regions of the screen most attended or visited by users while exploring the stimuli under study.

One of the problems with the use of these metrics is that, in some cases, there is no unanimity regarding their interpretation, so it is advisable to complement and contrast the information provided by the eye tracker with other sources of information.

3 Previous work

In recent years, our research team has made intensive use of eye tracking in the fields of Human Computer-Interaction and Educational Computing, with uses somewhat different from the usual ones. This section compiles some of our most outstanding work, which is summarized in Table 1.

Table 1 Eye tracking studies carried out in the fields of Educational Computing and HCI

In the first work [111] we analyzed the effectiveness of using various interaction devices (desktop, tablets and smartphones) to access learning materials. This was the first published study with a certain entity in which we made use of an eye tracking device to assess, and in it we performed a qualitative analysis of the records obtained (analysis of the complexity and density of the scan paths) while recording total study time and time spent browsing or analyzing the content to be retained. We also measured objective aspects such as learning efficiency and other subjective aspects (technology acceptance, cognitive load, and aspects related to the participant’s attitude during the development of the experiment). The experiments described in this work led to the conclusion that there were differences in the learners’ performance depending on the device used to access learning materials. The use of mobile devices increases the time dedicated to study and imposes an additional cognitive load (a more complex visual scanning pattern) compared to access through devices with greater visualization capacity, such as desktop computers and tablets.

Another of our research lines is the conceptual modeling of interactive and collaborative systems. Our research team has made several proposals of graphical notations to model the requirements of this type of systems, highlighting the CIAN notation [112] for modeling systems of work (CSCW) and group learning (CSCL) [113]. In a second published work [114], in which we made use of the eye tracker, we proposed to analyze and compare CIAN notation with another (CTT) widely used in the field of HCI [115]. Specifically, we were interested in analyzing the acceptance, understandability, and complexity (cognitive load) of both notations by potential designers of collaborative learning systems when performing comprehension tasks of diagrams created with both specification techniques. In this case, objective metrics (provided by the eye tracker device) were used to measure the cognitive load or processing difficulty of the individual elements that made up the diagrams created using the two visual languages under comparison. In this experiment, questions related to the information represented in the diagram were asked. The ease of locating and recognizing the visual elements (icons and graphic components) included in the graphic model, which allowed answering the questions posed, was recorded and compared using the findability and recognition measures provided by the eye tracking device. An analysis of diagram comprehension at such a low level would not have been possible without the use of this tracking device. The experience carried out concluded that the CIAN notation facilitated the location and recognition of the graphic elements representing the main concepts to be modeled and imposed a lower cognitive load than the CTT notation.

One of the main aspects to consider when designing collaborative systems is their support for awareness. Awareness is defined as “knowledge of the activities of other users, which provides a context for one’s own activity” [118]. This information allows knowing, but also anticipating, the actions of others, which results in a more effective collaborative interaction, and its implementation implies including elements and widgets in the user interface such as session panels, telepointers, information on the status of the other members participating in the group activity, etc. Our research team has extensive experience in the design, implementation and evaluation of collaborative applications in the field of teaching programming, and the result of this work is the COLLECE application [119]. COLLECE is a distributed synchronous programming system that incorporates many widgets and components to support communication (e.g., chat), coordination (turn-taking and voting systems), and awareness (telepointers, activity logs, session panel, avatars, and semaphores, among others). In a third work [75], we proposed to assess the usefulness and usability of these components, incorporating traditional evaluation measures (questionnaires) but also eye tracking measures. In this experiment, an eye tracker device was used to measure visual attention, which can be interpreted as usefulness, of the elements that make up the COLLECE interface in the context of the group programming task. This work concluded that the most attended and, therefore, most useful elements were the currently edited line of code (highlighted in the shared editor), the session panel (which incorporates the avatars of the peers and what they are doing), and the semaphores (indicating the turn ownership). Although usefulness was also measured with a validated and well-known framework, such as TAM (Technology Acceptance Method) [120], incorporating the eye tracking record allowed validating whether the subjective answers regarding utility and usability given by the participants were consistent with their behavior (visual attention) when interacting with the components of the graphical user interface.

Another application for teaching/learning programming, evaluated using eye tracking techniques, is GreedEx, an interactive assistant for teaching and simulating greedy algorithms [121]. This application incorporates several program visualization formats (code, graphical and tabular representations). In this study [116], we tried to analyze the usefulness and complexity of the three representation formats supported by this interactive tool. As in previous works, the objective of incorporating the ocular recording was to contrast the subjective opinion of the participants with their behavior during the task. In this case, visual attention (as an indicator of usefulness) and cognitive load (as a measure of comprehension and processing difficulty) were measured for each of the representations supported in GreedEx. The conclusions of the work were that the students considered the tabular representation to be the most useful, followed by the graphical representation, with the source code being the least used element. However, the eye tracking yielded different results, as it indicated that the most attended and used elements during the task were the source code and the graphical representation. Regarding the elements that imposed the greatest cognitive load, the objective and subjective results were consistent, and both revealed that the source code was the most difficult element to understand.

One of the lines of research that has generated most interest in recent years, in terms of the application of this technique, has been the evaluation of the effectiveness and efficiency of multimedia learning materials [22, 23]. In this sense, Mayer’s principles for the design of efficient educational content are a reference [122]. In another of our experiments, we wanted to evaluate the application of some of the design principles of multimedia learning materials for primary school students [117]. In this work, we measured aspects related to attention, cognitive load, and educational efficiency of different configurations of the same material for geometry teaching. For this purpose, we conducted four experiments incorporating an eye tracking device in the data gathering process. Using this technique, it is possible to analyze how the process of visual analysis of multimedia content takes place. This evaluation method allows to validate aspects empirically and objectively (physiologically), such as the comprehension process of multimedia contents, the attention of students while analyzing the information provided, or the cognitive load imposed by the visualized materials. In contrast to questionnaires or surveys, which have traditionally been used to analyze the effectiveness of the design principles of educational materials, eye tracking provides information that is not consciously controlled by the learners. Moreover, its use is a particularly interesting and useful source of information when children are involved in the evaluation, since it allows gathering information about their interests and preferences, which is more difficult to obtain with traditional (perception-based) techniques.

Finally, we would like to highlight the benefits and added value of incorporating the eye tacking technique in the studies described. Its use makes it possible to complement and improve the interpretation of the results obtained when evaluating educational or interactive systems and contents using traditional evaluation methods (based on the subjective perception of the participants), such as questionnaires, interviews and self-reports. Incorporating this additional objective information source, of a physiological nature and not consciously controlled by the subject, makes it possible to obtain additional information that enriches the final analysis and makes it possible to contrast and validate the other sources of information. Its use is also particularly useful when working with specific groups, such as children, who find it difficult to verbalize their preferences or describe their behaviors.

Therefore, the subjective or perception-based measurements and the metrics provided by the eye tracker complement each other, helping to mitigate the limitations of each approach by providing different perspectives on the same behavior. Thus, objective metrics provide accurate and quantitative measures of eye movements but do not reveal the subjective experience of the participant. On the other hand, subjective metrics, provided in questionnaires, interviews and self-reports, may be biased by the existence of individual differences, social desirability, or response biases, but their collection allows capturing and revealing aspects related to visual behavior that are not evident when dealing exclusively with eye tracking data. Thus, the combined and complementary analysis of both types of information will allow testing whether, for example, participants’ self-reported interest correlates with an objective attention metric (e.g., fixation duration), thus validating that the data provided by eye tracking are reliable and their interpretation valid. Therefore, combining both sources of information improves the interpretation of the results and, ultimately, enhances a better understanding of the perceptual and cognitive processes of the participants while visualizing a given content.

Therefore, based on our experience, we can state that eye tracking is a powerful tool for measuring and analyzing individual differences in HCI and EC, which can help researchers and designers create more personalized and effective interactive and learning experiences. Our experience has also allowed to identify and compile a series of lessons learned, recommendations and best practices, which are described in the following sections. This set of guidelines can be applicable in evaluation processes in the two areas of this work (HCI and CE) but it would also be useful and applicable in the areas described in Sect. 2.2.

4 Proposal of measures in eye tracking experiments

In our research using the eye tracking technique, we have relied on the combined use of objective metrics (those provided by the eye tracking device) with other, more subjective sources of information. The following subsections provide a compilation of the measures used in these studies, which constitute our proposal of measurement instruments and eye tracking metrics to be used in experiences in the fields of HCI and CE.

4.1 Objective measures: eye tracking and performance metrics

This section compiles and defines the main quantitative and objective metrics that could be collected in eye tracking sessions aimed at assessing aspects such as the usefulness, effectiveness or cognitive load of an interactive software or a certain educational material or resource:

  • Performance metrics Some objective measures can be collected in this type of experiment. For example, it may be necessary to know the time to assimilate an educational content, provide a response, or locate and interact with a particular element in an interface or image (TTC: Time to completion). This objective metric is a way to measure the efficiency or quality of a given displayed content and can be easily recorded by eye tracking software. In the case of the assessment of educational contents or resources, the so-called learning efficiency (LE) can be measured [123]. LE is calculated as the ratio between the score obtained by the learner (test score or TS) in a retention test (by which the learner’s knowledge is measured) and the time spent assimilating the content (TTL: time to learn): LE = TS/TTL.

  • Eye tracking measures There are many metrics that can be extracted from an eye tracking session. An extensive review is presented in [6], indicating how they are measured and their meaning or most common interpretation. The set of available metrics is usually divided into two broad groups: those that measure user interest in a particular part of the image and those that measure performance.

    Among the interest metrics we find pupil diameter, the percentage of participants who fixated on an AOI, the total number of fixations (FC) on that AOI or the total time spent inspecting it (TFD), as well as the percentage of total visual inspection time spent looking at one AOI versus the rest (TFD/TTC). All these metrics are related to interest, attention, and even usefulness (we show more interest or visually attend more to the areas of the screen that are more useful to us in the performance of a given task).

    On the other hand, we find another set of metrics that allow measuring the performance of a user in the fulfillment of a task. Some of them are indicative of the mental effort and cognitive processing associated with the visual analysis of an image or interface. Such is the case of pupil diameter or the average fixation duration in an AOI (AFD). Longer fixation durations indicate that more time is required to understand individual objects, i.e., greater difficulty in extracting information. There are also metrics that indicate how quickly an element is located in an image. These include the time to the first fixation on the AOI containing the relevant information or target (TFF) or the number of fixations generated until the first fixation on that target (FB). But locating an object in an image is not the same as recognizing it. Measures of object recognition include the number of target visits before answering (VC) or the time between the first fixation on the target and the time at which the response to the task is given (TFFA).

    Table 2 shows our proposal of eye tracking metrics to measure aspects related to usability, cognitive load, and effectiveness in user interfaces or educational resources and materials.

Table 2 Eye tracking metrics and their interpretation

4.2 Subjective measures

In the different experiments carried out, several qualitative instruments were used, which are listed below, and that allow complementing and contrasting, through a subjective source of information, aspects measured by the eye tracking device: interest, usefulness and cognitive load. Depending on the specific objective of each work, it may be appropriate to apply some of them or others, but this section is a compilation of those that we consider most interesting and useful in this type of experiences:

  • TAM-based questionnaire An adaptation of the Technology Acceptance Method (TAM) [120] framework is proposed, which makes it possible to measure, by using Likert scales 1–5, aspects such as the perceived ease of use, perceived usefulness, and usage intentions of the specific technology or element of the user interface under assessment. Perceived usefulness (PU) refers to the degree to which a user believes that a technology will help them achieve their goals and improve their performance. It is influenced by factors such as the features and functions of the technology, the user’s prior experience with similar technologies, and the perceived compatibility between the technology and the user’s needs. Perceived ease of use (PEU) refers to the degree to which a user believes that a technology will be easy to use and learn. It is influenced by factors such as the simplicity of the technology's interface, the user’s prior experience with similar technologies, and the availability of training and support. Intention to use the technology (ITU) refers to the user’s intention to use the technology in the future. It is influenced by factors such as their attitudes towards using technology, their perceived usefulness and ease of use, and their social norms and expectations.

  • CLT-based questionnaire Cognitive Load Theory (CLT) is a framework that describes how the human brain processes and retains information [133]. It suggests that our working memory has limited capacity and that we can only process a certain amount of information at a time [134]. This theory differentiates between three types of cognitive load: intrinsic, germane and extraneous [135]. Intrinsic cognitive load refers to the inherent complexity of the material being learned, and it is determined by the number of elements and their interactivity in the learning material. Extraneous cognitive load refers to the unnecessary cognitive processing that can occur when learning, such as distractions or irrelevant information. And, finally, germane cognitive load refers to the mental effort required to process the information in a way that leads to meaningful learning. It is related to the processing of information that is essential for understanding and long-term retention. This theory argues that by understanding and managing these three types of cognitive load, educators and instructional designers can develop effective learning materials and strategies that optimize working memory and promote deeper learning. In several of our experiments, we have included questionnaire items to measure the three types of cognitive load identified by this theory using 5-value Likert scales.

  • Subjective measures of engagement, attitude, and motivation during the test There are other aspects of a subjective nature that may influence the development of the test and should be recorded, related to factors of a personal nature, that may influence the user’s behavior during the assessment session. In the different developed experiments, subjective questionnaires have been incorporated to try to measure aspects such as interest in the activity or task performed (INT), intrinsic motivation (MOT), the effort made by the participants to perform the activity well (EFF), the voluntary nature of participation in the activity (CHO), their perception of their level of competence in carrying out the activity to be performed (COM) as well as the pressure or strain experienced during the test (PRE).

  • Aspects related to the user’s profile Especially in the pre-test questionnaire, with the intention of assessing and contrasting the influence of personal factors on test performance, and in the metrics recorded, demographic measures or the participant’s profile should be included. Regarding this type of variables, we have considered as demographic factors, mainly gender and age, to detect differentiated behavior in men or women [30, 83, 136,137,138] and in different age ranges [138, 139]. One factor that we believe may influence user behavior is personal innovativeness in the domain of information technology (PIIT) [140]. This concept can be defined as the predisposition, attitude, or tendency of a person to experiment with and adopt new information technologies. Other factors that we consider essential to consider are [141,142,143]: experience and/or expertise in the use of a particular type of software or device, the attitude or personal opinion of the participants about its use, pertinence, or adoption for a given objective, as well as prior knowledge (PK) in a certain study domain, in the case of evaluating the efficacy or effectiveness of particular educational materials or resources. All these aspects can be measured by statements rated on a Likert scale (1: ‘‘strongly disagree’’ to 5: ‘‘strongly agree’’). In some of our works [117], mainly those that have been carried out in the educational field, we have also recorded student characteristics, such as learning style [18,19,20,21, 144] or intelligence quotient (IQ) [145].

4.3 Proposal of measures in other application areas

Finally, we would like to make a final remark on the possible use of the objective and subjective metrics proposed and compiled in this section in other areas, such as those listed in Sect. 2.2.

The set of objective metrics recorded by the eye tracker can be directly applied to other areas, and their interpretation is as indicated in this section (Table 2). In the case of objective performance metrics, it will be necessary to select those specific to each domain and take into account the objective of the task to be analyzed. The time required to complete (TCC) a given task could be applied in other areas, but not the time to learn (TTL) or learning efficiency (LE).

As for the set of subjective metrics, with the exception of those provided by the TAM framework, more oriented to studies of adoption, acceptance and usability of technologies and applications, as well as those related to the subject’s educational profile (learning style or IQ), the rest of the measures collected (CLT, motivation, effort, pressure, demographic factors, expertise, prior experience, etc.) could be directly applied in other studies.

5 A tool set of recommendations and guidelines

One of the main challenges faced by researchers who want to make use of this technique is the lack of guidelines and methodological proposals to help in its application [146]. There are some contributions in specific fields, such as software engineering [147] or mathematics education [148], as well as good practices for the creation of results reports in studies of this type [149].

In this section, we compile the main lessons learned from our experience applying the eye tracking technique with different objectives in the fields of HCI and Educational Computing, and discuss some of the main aspects to take into account (Fig. 2) regarding: (5.1) the design of the experiment and its development; (5.2) issues related to the application of this specific evaluation technique; (5.3) issues related to the recruitment and profile of the participants and their influence; and finally, (5.4) some considerations to take into account in the phases of analysis, interpretation and reporting of results.

Fig. 2
figure 2

Main issues to consider in an eye tracking session in HCI and EC

5.1 Experiment design and development issues

This subsection discusses some of the main aspects related to the experiment’s design and development. Specifically, recommendations are made in relation to the planning and design of the experiment, the control of the main factors that can affect the development of eye tracking tests, as well as the need to carry out pilot studies to adjust and configure all these aspects before approaching the final experimentation.

5.1.1 [R01] Design a schedule for the tests and reserve a time slot for each participant

In this type of test, it is important to adjust the time needed to properly attend to each participant. Each experimental subject will be scheduled for a specific time slot, in which sufficient time should be allowed for: providing a detailed explanation of the test; reading, signing, and resolving doubts regarding the informed consent; completing the pretest; performing the calibration (which sometimes involves several iterations if this is not done correctly the first time); performing the experimental task(s); and finally, completing the post-test.

5.1.2 [R02] Defining the hypotheses

The adequate definition of the starting hypotheses is essential in any experimentation, and therefore, it is also an aspect of great importance when carrying out assessments based on the use of eye tracking. The definition of the starting hypotheses guides the research design, helping researchers select the appropriate stimuli, make the correct delimitation of the AOIs, and choose the most relevant metrics to answer the research questions. A clear formulation of the initial hypothesis establishes an adequate framework for the collection, analysis and interpretation of the measures collected.

5.1.3 [R03] Control environmental conditions

Environmental conditions such as lighting, noise, temperature, and the presence of other people can affect the quality of eye tracking data. Therefore, it is proposed to perform the experiments in controlled environments, whenever possible, to minimize their impact and ensure that the data recorded during the eye tracking session is reliable and accurate.

In particular, the lighting conditions in the room can have a significant impact on the accuracy and reliability of eye tracking experiments. Some of the effects they can produce are glare or reflections, which can influence the ability of eye tracking equipment to accurately detect eye movements. Lighting conditions can also affect the screen contrast of the displayed stimuli. One of the metrics that can be affected by lighting conditions is pupil size. Bright light conditions can cause pupils to constrict, while dim light conditions can cause pupils to dilate. Changes in pupil size can affect both the calibration of the eye tracker and the interpretation of this metric, if used in the analysis. If there are any manufacturer’s guidelines for illumination conditions, they should be consulted and considered.

5.1.4 [R04] Configuration and adjustment of the physical space

The adjustment and configuration of the physical space occupied by the participants are essential when preparing an eye tracking session. Such settings include adjusting the distance, height, and angle of the eye tracking system and the recording camera (if present during the experimentation), as well as making sure that the participant’s head is stabilized and correctly positioned. The use of height-adjustable chairs is recommended to suit the characteristics of each participant. Keeping the physical conditions of the tests stable among the different participants will allow a better comparison of the results obtained.

5.1.5 [R05] Correct calibration

During the calibration process, the eye tracker is adjusted to the user’s specific eye movements and characteristics. Thus, calibration is an essential step in an eye tracking session to ensure that the eye tracker is accurately capturing the participant’s eye movements.

This is a task that may need to be performed several times before starting the recording with each participant and until the calibration is deemed to be correct. It is important to note that calibration should be performed at the beginning of each eye tracking session to account for any changes in the participant’s positioning or lighting conditions that may affect the accuracy of the data.

5.1.6 [R06] Early problem detection

Early detection of potential problems by conducting a pilot study is essential before conducting the main study, allowing researchers to test and refine their experimental design and procedures. In the context of eye tracking sessions, pilot studies can help researchers identify potential issues with the eye tracking equipment, such as poor calibration or tracking accuracy, and make adjustments to ensure reliable data collection.

5.2 Eye tracking technique issues

There are a series of issues to be considered that depend on the specific characteristics of this evaluation technique and that concern the appropriate selection of the metrics, the definition of the areas of interest, the type of visual stimulus and its nature, as well as the measurement of the quality of the records obtained during the tests.

5.2.1 [R07] Performs an appropriate selection and interpretation of metrics

The selection of appropriate metrics is crucial in eye tracking sessions. It is important to choose metrics that are suitable for the specific research questions being addressed and to avoid using those that do not provide relevant or meaningful information.

Particularly noteworthy is the fact that there may be measures that may have more than one interpretation or that may be affected by the nature of the task to be addressed (e.g., its complexity), participant characteristics or circumstances (e.g., motivation, interest, fatigue), or environmental conditions (pupillary dilation). It is therefore advisable to combine and complement the measures provided by the eye tracker with more metrics, both of an objective nature (recordings of the participant’s face and expressions, audio recording), as well as subjective measures (participant’s subjective perception) that allow contrasting the eye tracker measures.

Choosing inappropriate metrics or interpreting them incorrectly can lead to misleading results, which can compromise the validity of the experiment.

5.2.2 [R08] Combine subjective and objective measures

It is recommended to combine objective and subjective metrics in eye tracking experiments to understand more comprehensively the behavior and visual experiences of the participants. Objective metrics, mainly based on fixations (quantity and duration), provide quantitative physiological measures that allow measuring aspects related to attention, cognitive load, and exploration of the visualized stimuli. On the other hand, subjective metrics, based on subjects’ personal opinions about aspects such as task difficulty, interests and preferences, provide qualitative information on their perceptions and experiences, as well as on the factors that may be influencing their attention and visual behavior during the tests.

5.2.3 [R09] Consider visual stimuli type (static vs. dynamic)

One of the aspects that can most influence and complicate the processing and analysis of the data recorded in an eye tracking session is the type of visualized stimuli. In this regard, we may encounter three different situations: (a) a stimulus of a static nature is displayed (an image or screenshot); (b) a video is projected or a fixed-frame video game is interacted with (in which it is known, to a certain degree, what appears on screen at a certain instant of time or frame); and (c) a user interface (application or web page) is interacted with, in which user interaction is free, resulting in unpredictable what may appear on screen at each instant of time since it will depend on the interaction performed.

Depending on the nature of the stimuli, metrics gathering becomes more complex, and it is necessary to extract the information from video fragments (case b) or even to analyze one by one the records generated by each subject (case c).

5.2.4 [R10] Consider the nature of the visual stimuli and the experimental task

The type of visual stimuli and task to perform by the experimental subjects have a significant influence on the eye movement patterns and the study results.

Different types of stimuli can elicit different types of eye movements and attentional processes, which can affect the interpretation of the results. Thus, for example, the complexity of a visual stimulus may affect eye movements, resulting in more frequent and prolonged fixations since participants may need more time to process the displayed information. The familiarity of the visualized stimulus also affects the pattern of movements. Thus, visualizing an unfamiliar stimulus may result in more dispersed and exploratory eye movements, being more concentrated and of shorter duration otherwise. The visualized content may also generate certain emotional reactions in the participant (attraction, rejection, fear, etc.), leading to different behaviors and exploratory responses to the scene. Also, the amount and legibility of the text (font, size, etc.) contained in the visualized stimuli will give rise to very different records.

The complexity and type of experimental task to be performed will also influence the cognitive and attentional processes and, therefore, the participant’s pattern of eye movements. Thus, for example, some tasks will require the participants to locate certain information or target, or to memorize it, while others will require freer interaction and visualization. The duration and frequency of fixations, as well as the number of saccades generated, will therefore depend on the type of task to be performed in each case.

In summary, when carrying out the experimental design, researchers should carefully consider the type and characteristics of the stimuli used (cognitive and attentional demand, familiarity, emotional valence), as well as the task to be performed (free versus target-driven or task-driven), since their nature influences the eye movements and the interpretation of the data.

5.2.5 [R11] Properly define the AOIs

The definition of Areas of Interest (AOIs) is a fundamental step in most eye tracking studies. AOIs are specific regions or objects in a visual scene that are of interest to the researcher and relevant to the study to be conducted. They can be defined to delimit, for example, the area occupied by a specific widget or component in a user interface, distracting elements in educational material or web page, or the part of the scene that contains the response or target of a given task. AOIs are defined prior to the eye tracking session and allow the analysis and collection of metrics to be restricted to specific areas or regions of the visual stimulus. Their definition allows comparing, for example, the interest or attention paid by the participants to the different regions of an interface, allowing to determine which ones are more attended or those that are ignored.

When defining AOIs, there are several aspects to take into account: the research questions (which guide their definition), the characteristics of the stimulus (complexity, size, resolution, etc.), the information contained in the specific region (text, image, distractor, advertisement, widget, etc.), and the type of task to be performed (search and location of a target, detection of distracting elements in an interface or educational material, etc.).

5.3 Participant-related issues

This subsection discusses aspects related to the participants in eye tracking tests and their influence on the interpretation, generalization, and quality of the results. Among the aspects to be considered are recruitment, sample size, ethical and data protection issues, the performance of tests with groups with particular characteristics (children, people with disabilities), and the problems related to sample loss, which is quite common in this type of study.

5.3.1 [R12] Plan participant recruitment

Participant recruitment is a crucial phase in conducting an eye tracking experiment. A well-designed recruitment strategy can help ensure that the study involves a diverse and representative sample of participants that can provide valid and reliable data. In this process, the characteristics of the target population must be determined, which should be those relevant to the research question and hypothesis (age, gender, or previous experience with the visualized stimuli or the task to be performed).

5.3.2 [R13] Consider the participant profile

The participant’s profile, including age, gender, and other demographic factors, can influence eye tracking sessions in a variety of ways.

Research has shown that eye movement patterns can change with age due to differences in visual acuity and attentional and cognitive abilities that are dependent on this factor [139]. Thus, slower or less precise visual scanning movements can be generated, as well as different movement patterns depending on the age of the participant [138].

Gender-dependent differences in attentional and cognitive patterns are of great interest in different fields, which has given rise to a large number of studies analyzing the influence of this factor [30, 83, 136,137,138]. In any case, these differences will be affected by the type of task and stimulus and other factors (motivation, interest, etc.).

The different attentional and cognitive capacities of the participants, due to certain disorders or conditions, will result in the recording of different patterns. Thus, for example, ADHDFootnote 5 individuals exhibit particular eye movement patterns and fixation durations [150]. On the other hand, the analysis of eye tracking data collected from children may involve different techniques than the analysis of data collected from adults due to differences in attentional control and cognitive development.

Although there are not as many studies in this regard, cultural differences have also been of interest and have yielded results to be taken into account when interpreting the records [151, 152]. For example, reading patterns (from left to right or vice versa), as well as a greater interest in objects or people, may vary according to cultural differences among participants in this type of test.

Depending on the specific experimental task and what is to be evaluated (a software, web page, or educational material), the subject’s previous experience and knowledge will have an impact on the type of visualization performed [141,142,143].

In summary, it is important to consider the participant’s profile when designing and interpreting eye tracking studies, as individual differences can influence eye movements and gaze behavior.

5.3.3 [R14] Adapt the test to specific groups: children and people with disabilities

When conducting eye tracking sessions with children or people with disabilities, it is important to make adaptations to ensure that the sessions are appropriate and accessible to them. Some of these adaptations involve simplifying the instructions, making them clearer and more concise, and ensuring that they are understood. In some cases, it is also necessary to adapt the subjective assessment questionnaires or to involve family members or caregivers in their completion, as well as to include visual coding in both the statements and the response options.

In the case of tests with children, both the stimuli and the marks to be used in the calibration phase should be appropriate to their age and, as far as possible, be attractive and motivating for them (colorful, animated), but without being a distraction.

In terms of physical equipment, it is advisable to use larger screens and to activate accessibility features, for example, to increase the font size or contrast of the displayed stimulus. Physical adjustments (e.g., related to height and distance to the recording equipment) should be made when working with children. In the case of the physically handicapped, other types of customized adaptations are necessary. For example, in the case of users with cerebral palsy, care should be taken to limit involuntary body or head movements as much as possible, as they could affect the recording.

Adjusting the duration of the session is particularly critical in such cases. People with disabilities may need more time to perform the tests and should be given sufficient time to complete the tests without experiencing anxiety or stress during the tests. Fatigue and boredom can also play a role when conducting sessions with these groups of participants, so breaks during testing are often necessary.

In general, it is important to be flexible and tailor the eye tracking session to the unique needs and abilities of each participant, and take these individual differences into account when analyzing and interpreting the results.

5.3.4 [R15] Avoid sample loss

Sample loss is a problem in any experiment, as it can reduce the statistical power of the study, making it necessary to recruit more participants and replan the development of the experiments.

Sample loss in eye tracking experiments is common and is estimated to be around 20%. This loss can be due to several factors: equipment failure, poor calibration, and participant-related factors (fatigue, distractions, problems or impossibility of calibration, body or head movements during the test, moving the chair closer to the device, or adopting postures that impede tracking, among others). To reduce the risk of sample loss in eye tracking experiments, it is advisable to over-recruit participants and to incorporate corrective measures during the development of the tests (e.g., warn participants before and during the development of the experiment about the adoption of appropriate postures and the need to avoid sudden movements).

5.3.5 [R16] Keep in mind that all subjects are not trackable

The sample loss may also occur since some people do not pass the calibration phase, i.e., they have certain personal characteristics that make them not trackable by the eye tracker. Some of the causes may be due to the anatomy of the eye (shape, size, or color of the eye, shape and size of the eyelashes, or drooping eyelids). Certain eye conditions, such as strabismus or the use of contact lenses or glasses, may interfere with test recording. Researchers should be aware of these limitations and take them into account when designing their experiments and, in particular, when recruiting participants for the tests.

5.4 Analysis, interpretation and result reporting

Finally, some aspects and recommendations are discussed in the final phases of the tests, i.e., during the analysis and reporting of the results obtained.

5.4.1 [R17] Processing of large amounts of complex data

In general, eye tracking data can be complex and difficult to process due to its large volume and variability. Therefore, specialized software and expertise by researchers are often required for proper processing, analysis, and interpretation.

If the objective of the study is to identify patterns and infer meaningful relationships between metrics, the analysis of the data becomes considerably more complicated, requiring the application of more advanced data analysis techniques.

5.4.2 [R18] Check the quality metrics

It is important to ensure high-quality eye tracking data to ensure accurate results and reliable interpretations. There are several ways to measure the quality of the records in eye tracking sessions. Among them, the most important are accuracy and precision [153]. Accuracy refers to how closely the recorded gaze points match the actual gaze location on the screen. High accuracy indicates that the recorded data closely reflects the true gaze position, while low accuracy indicates significant deviations from the true gaze position. Precision refers to the consistency of the recorded gaze points. High precision indicates that the device consistently records the same gaze position for the same eye movement, while low precision indicates variability in the recordings.

Another metric to consider when collecting the final data is the quality of each participant’s calibration. Calibration quality refers to how well the device has been calibrated to each participant's eye behavior. Higher calibration quality means that the device more accurately tracks the participant’s gaze points. It is recommended to disregard the records of participants who have presented problems during the calibration phase.

5.4.3 [R19] One metric, several interpretations

In some cases, the metrics obtained can be difficult to interpret, or, as in the case of pupil diameter, the same metric can be interpreted in several ways [6]. It is therefore advisable to complement and contrast the metrics obtained with the use of other subjective and qualitative techniques, such as the thinking-aloud method. This method consists of asking each participant to describe verbally what they are thinking, what doubts they have, why they are carrying out an action, or visually exploring one area of interest or another [99]. The combined use of both techniques allows obtaining better conclusions and interpretations of the behavior of the subjects under study. Since during the eye tracking session all the subject’s visual activity is recorded, it is also common and recommended to use the retrospective thinking-aloud (RTA) technique [154,155,156].

5.4.4 [R20] Use heat maps with caution

Heat maps are a popular visualization technique in eye tracking studies to represent the areas of the visual scene that receive the most attention from the participants. Although heat maps can be useful for providing a quick overview of eye tracking data, their use has some limitations [157]. These limitations include the fact that they do not provide information about the order of fixations, which is often necessary to understand the processing of a visual scene. Another reason is that they can oversimplify the data. For example, they do not provide accurate location data, as they only show an overview of where people are looking but do not provide accurate location data. This can be problematic for studies where the exact location of eye fixations is important. Last, it is difficult to compare heat maps, even more so if they have been created using different scales or normalization techniques. Although heat maps can be a useful starting point for presenting results, it is recommended to complement them with other visualizations, such as scan paths and metrics based on times and fixations, that provide more detailed information about eye movements.

5.4.5 [R21] Generation of the results report

Communicating the results of eye tracking experiments involves describing and analyzing the collected data in a clear, concise, and meaningful way for the reader [149]. The report should include the interpretation and discussion of the main results (in the light of the hypotheses and the existing literature on the topic), the implications of the findings (at the theoretical, practical, and research dimensions), as well as possible limitations of the study or threats to the validity thereof. Visualizations, such as heat maps or scan paths, can also be included in the final report to complement the findings or, at a glance, to transmit general ideas about behavioral or search patterns.

In the report, it is important to include detailed information about the design of the experiment, the characteristics of the participants, the instruments (questionnaires or metrics used), the model and features of the eye tracking device, the characteristics of the visualized stimuli, as well as the procedures or steps followed during experimentation. This will make the study reproducible for other researchers.

6 Proposal of methodological approach

Having discussed the main factors to be taken into account when conducting eye tracking sessions, we proceed to present a methodological approach or series of steps to follow in order to apply this evaluation technique. The process, shown in Fig. 3, describes the usual steps in usability testing, highlighting with a yellow background the specific stages related to the use of the eye tracking technique. The main tasks (T) that make up each of the main phases (P) will be described below, connecting with the recommendations (R) presented in the previous section that are applicable.

Fig. 3
figure 3

Methodological approach for eye tracking evaluations

6.1 Planning and experimental design

This methodological proposal includes a first phase of design and planning of the experiment, in which the hypotheses or research questions of the work are defined (R02), the visual stimuli to be analyzed (R09 and R10) and the AOIs are designed (R11), and an appropriate selection of metrics (objective and subjective) is made to validate or answer the research questions (R07 and R08). Regarding the selection of metrics, in Sect. 4, we have compiled our proposal of measures (objective and subjective), which are those that we recommend for use by researchers who want to apply the eye tracking technique to evaluations in HCI and EC projects.

In this first stage, special care must be taken to configure and set up both the eye tracking equipment and the rest of the physical characteristics of the workstation and environment in which the tests will be performed (R03 and R04).

6.2 Pilot test

Once planning and design has been completed, a pilot test should be carried out with a reduced number of users (at least one) to make adjustments in the design of the experience as well as to define a conveniently documented protocol to be followed during the development of the tests (R06).

The pilot study may reveal, for example, that the visual stimuli used in the study are not suitable for the resolution of the screen on which it is to be displayed or that the lighting conditions in the laboratory cause reflections, which may affect the recording. During its performance, it is also possible to refine aspects related to the estimated time to perform the test, the need to use height-adjustable chairs, or the recalculation of angles and separation distances with the tracking device, as well as to elaborate an action protocol, which is very useful in case the test can be supervised by different researchers.

This phase is crucial in eye tracking research sessions because it allows researchers to identify and address any technical, logistical, or methodological issues that could compromise the validity and reliability of the data collected during the final test.

6.3 Final test

Once the experience has been designed and planned, the final test can be carried out. The first phase consists of recruiting the participant sample (R12, R13, and R14), recommending over-recruitment (R15) to avoid problems related to non-trackable participants (R16) or invalid or low-quality records (R18). A testing schedule should be established (R01), which allows each user to complete the four sub-stages that comprise the test: (1) fill in the pre-test; (2) calibration of the device and adjustment of the workstation; (3) performing the task(s); and (4) fill in the post-test.

As for the recruitment method, this can be varied and could be done through social networks, mailing, or by contacting potential participants from a specific organization or community. Sometimes it is necessary to make use of incentives, in the form of monetary compensation or increased scores in a subject, to motivate participants to take part in the tests.

One of the most important steps that will affect the quality of the records obtained is calibration (R05). This step involves setting up the equipment according to the manufacturer’s instructions and having the participant follow an established calibration procedure, which typically consists of the following steps: (1) The participant is instructed to sit comfortably in front of the eye tracker device and look at a fixed point on the screen; (2) a series of dots or marks are shown at different locations on the screen, which must be followed by the participant's eyes; and (3) as the user follows the sequence of marks, the eye tracker records the position of his or her eyes, creating a personalized calibration profile for each participant.

6.4 Result analysis and reporting

Following the tests, the next phase is the analysis and interpretation of the results. In this step, the first task is to compile the collected measurements. The collected data must be organized and pre-processed (R17), eliminating invalid or low-quality records. For this purpose, it is recommended to consider the quality metrics (accuracy and precision) of the results obtained and discard incorrect records in the final analysis (R18).

In the case of eye tracking metrics, depending on the type of task and stimulus (static vs. dynamic), the process of gathering the measures may be a more or less laborious task (R09 and R10). In the case of stimulus of a static nature, the analysis is simpler, as it is only necessary to define the AOIs and collect the fixations, times, etc. in each of them for their interpretation. In the case of a video or a fixed-frame video game, it is necessary to segment portions of videos, or sets of frames, among which it is known what appears in the different parts that compose the visualized scene, and then proceed as in the case of a static stimulus. However, when what is analyzed is the interaction with the user interface of an application or web, the extraction of information becomes considerably more complicated, requiring individualized analysis of the records made for each of the users, which complicates the process and demands greater resources (time and effort) for processing and extracting the information of interest.

Next, statistical analysis (descriptive, inferential, and correlational) will be performed, or advanced data analysis techniques will be applied (machine learning, pattern analysis, etc.), necessary to describe and interpret the results according to the hypotheses and research questions posed (R19). In that sense, it is common to describe the data using descriptive statistics, from metrics based on counts or duration of fixations and from the questionnaires or other measurement instruments administered. These statistics can be calculated and described for individual participants or groups of users. In the case of tests comparing the behavior of different groups or evaluating different conditions (e.g., different designs or formats of software or educational material), inferential statistics, such as t-tests or ANOVA, or their nonparametric equivalents (Mann Whitney and Kruskal Wallis tests), should be included to determine whether significant differences exist between such groups or conditions. If objective and subjective measures are to be included, a correlation analysis (Pearson or Spearman) is recommended to contrast the results obtained from both sources of information.

Finally, a report will be produced (R21), summarizing the results, in which visual representations may be included (being cautious in the use of heat maps) (R20). The limitations of the study or threats to its validity should also be described, as well as a detailed explanation of the experience, in order to allow its replicability.

As highlighted in Sect. 3, which described our previous experience, the incorporation of the eye tracking technique complements and enriches traditional evaluation techniques. Its use provides an additional and complementary source of information, which allows for contrasting other sources and helps in their interpretation.

Its incorporation in traditional usability evaluation methodologies and processes is perfectly feasible, as has been shown in this section, although it implies an increase in the time dedicated to planning and accommodation of the physical spaces in which the tests are carried out, as well as in the development of the tests and their subsequent analysis. In addition, it should not be forgotten that it requires a special device and equipment, which implies a cost, and that there is a greater risk of sample loss than the use of classical techniques. In this sense, we consider that by following the recommendations and guidelines listed in Sect. 5, it is possible to address these problems with greater guarantees of success.

In any case, we consider that the inclusion of this evaluation technique is beneficial in the fields of HCI and CE and allows obtaining and interpreting the results in a more comprehensive manner.

7 Conclusions and future works

Eye tracking can be used in the fields of HCI and EC to measure, through an objective and non-intrusive method, aspects related to the human factor, in particular, the attentional and cognitive processes of users and learners. Its use in HCI allows researchers and designers to identify usability and accessibility problems and develop more effective, user-friendly, and inclusive interfaces. The analysis of user interaction and navigation patterns allows for the identification of which features improve designs versus those that detract from the user experience. In EC, this technique provides valuable information on how students process and learn information, allowing designers to create educational materials and resources that are better adapted to their preferences, interests, and capabilities. Their use also provides insight into the patterns and strategies of how learners learn information according to their profile, e.g., depending on their learning style and their cognitive and attentional abilities.

In our case, it has been applied to evaluate the use of different interaction devices to access educational materials, assess the usefulness and cognitive load imposed by different visual information representations or by different awareness support widgets and components in collaborative learning systems, study the efficiency of different configurations of multimedia educational materials, as well as compare the understandability of notations for the conceptual modeling of interactive and collaborative systems. In all our studies, we have relied on the combined use of objective metrics, provided by the eye tracking device, with other data sources, such as performance measures and participants’ opinions. In this paper, we have compiled the main metrics, both objective and subjective, that may be of interest in studies of this nature.

Consequently, based on our extensive experience over the last ten years in the application of this technique, we have compiled a tool set of lessons learned, good practices, and guidelines to follow, which, together with a first approach to a methodological proposal, constitute the main contribution of this work. We consider that these contributions may be useful for those researchers who wish to use this assessment technique in their works, both for the particular case of HCI and EC but also in any other domain.

Our research team remains committed to the use of eye tracking in our research, and we will continue to make use of it both in the domains discussed in this paper and in others (such as in the design and evaluation of applications for users with special needs and, in particular, with ASDFootnote 6).