1 Introduction

The eyes play a pivotal role in decoding visual information. Thus, the observation of eye movements can offer insights into cognitive processing (Holmqvist et al., 2011; Just & Carpenter, 1976). This is probably the reason why the method of eye tracking has been used in numerous studies of different domains (see Duchowski, 2007, for an overview), including research on learning (see Lai et al., 2013, for a review) and mathematics education (Barmby, Andrà, Gomez, Obersteiner, & Shvarts, 2014; Hartmann, 2015; Lilienthal & Schindler, 2019; Mock, Huber, Klein, & Moeller, 2016; Perttula, 2017; Schindler, Haataja, Lilienthal, Moreno-Esteva, & Shvarts, 2018).

The increasing emergence of all studies using eye tracking comes with challenges for mathematics education researchers. First, the number of studies has arguably reached a point at which it has become difficult for researchers to monitor. Eye tracking seems to be widespread across many domains of mathematics education research, making it difficult to keep track of its usage, potential, limitations, and specific challenges. Particularly, researchers who use eye tracking should monitor studies not only in their specific domain, but also outside of it, as there could be studies that use a similar methodological approach to a different end.

Second, with the development of new eye-tracking hardware and data analysis software, the possibilities of data collection and interpretation have become increasingly manifold. For instance, using eye-tracking glasses as opposed to a static eye tracker not only introduces the chance to address new research questions but also requires new methodological considerations. Overall, it has become more and more difficult to summarize, compare, and relate results from eye-tracking research. In mathematics education, this trend has been reported in recent years (Lilienthal & Schindler, 2019; Schindler et al., 2018).

Third, among educational research domains, the domain of mathematics is unique in the way it uses text, mathematical symbols, and visualizations and in how these forms of representation are integrated (Andrà et al., 2009; Ott, Brunken, Vogel, & Malone, 2018). Accordingly, studying people’s eye movements during mathematical activities (e.g., reading of mathematical proofs) may require different or modified approaches than those applied successfully in other domains (e.g., text reading or science education). However, these specific affordances of the eye-tracking method in studying mathematics are not well understood.

Lastly, manufacturers have greatly improved the usability of eye-tracking devices and made them more affordable so that an increasing number of researchers has gained access to eye-tracking devices. However, using eye tracking in research still requires complex knowledge, which many researchers in mathematics education had to acquire by themselves (Lilienthal & Schindler, 2019). Accordingly, the mathematics education community has seen an increasing call for guidance and clarity about the possibilities and limitations of the method, as evidenced by working groups at international conferences such as the Annual Conference of the International Group for the Psychology of Mathematics Education (PME; Barmby et al., 2014; Schindler et al., 2018).

Four previous review studies summarized eye-tracking research relevant for mathematics education. Lai et al.’s (2013) review described the state-of-the-art of eye-tracking research in the field of education but was not specific to mathematics education. Moreover, in the 7 years since its publication, the number of studies using eye-tracking technology may have greatly increased, and thus a critical examination of the current potential of eye tracking and its challenges seems timely. The other three studies only covered specific aspects of mathematics education and thus lack comprehensiveness. A conference paper published by Perttula (2017) provided a summary of eye-tracking studies in mathematics education. This review included a limited selection of 28 studies that focused on the use of eye tracking in studies on mathematical representations. Mock et al. (2016) reviewed 45 studies in their systematic review; however, it was solely focused on the subdomain of numerical cognition. Most recently, Lilienthal and Schindler (2019) reviewed 34 contributions related to eye tracking in the proceedings of the PME. This review illustrated the continued relevance and the potential of the eye-tracking method but focused only on the proceeding papers of a specific conference, rather than the mathematics education literature as a whole. A systematic, comprehensive review on the use of eye tracking in the mathematics education literature does not yet exist. The current paper seeks to fill this gap.

2 Goals and research questions

We present a systematic review, which critically investigates the use of eye tracking in mathematics education, defining three aims and associated research questions.

Our first aim was to provide an overview of all studies using eye tracking in mathematics education. This included domains and topics that were addressed as well as the date and type of publication. Articulating the range of topics can help researchers to become aware of gaps in the literature, distinguish possible benefits of the method for specific areas of mathematics education, and support them in finding relevant eye-tracking research in fields that are similar to their own. Therefore, our first research questions were:

  • RQ1a: How many studies used eye tracking in mathematics education and when and in which journals were they published?

  • RQ1b: In which domains of mathematics education is eye tracking used and what overarching topics have been addressed?

Our second aim was to critically review the methodology of eye-tracking research in mathematics education, including both technical and statistical aspects. To obtain meaningful and reliable data in an eye-tracking experiment, the implementation of the technology must be carefully considered. This issue regards, for example, the calibration and setup of the apparatus, the design of the stimuli, and the analysis of the raw data. We analyzed how these issues were addressed in the reviewed studies and what implications should be drawn for future research. Accordingly, our second research question was:

  • RQ2: How was the eye-tracking methodology implemented and what details of this implementation were reported in the studies?

The third aim of the current review was to assess the way in which eye-tracking data are typically interpreted in mathematics education research. The interpretation of eye-tracking data is challenging because the same data may be linked to various cognitive processes (Schindler & Lilienthal, 2019). Interpretations vary depending on the research questions and the particular type of tasks involved. Thus, our third research question was:

  • RQ3: How were eye-tracking data interpreted in mathematics education research?

3 Method

3.1 Paper selection

We selected papers both through a systematic database search and by carefully checking cross-references from all relevant results. First, we conducted a database search in Scopus, PsycARTICLES, Education Source, ERIC, Science Direct, Web of Science, and MathEduc, which arguably reflect the most common sources for studies in the field of mathematics education. We considered studies that were published until and including 2018. The first step was the database search, for which we used the search string: (eye OR gaze) AND (movement* OR track* OR record*) AND (math* OR “numerical cognition”),Footnote 1 referring to titles and abstracts. Duplicates were automatically discarded. This resulted in a total of 1491 studies. Step two involved screening titles and abstracts using the following criteria: (a) The study was published in a journal article, a book chapter, or in conference proceedings; (b) the study was published in English; (c) the topic of the study was broadly related to mathematics education, meaning that the study reported using a mathematical task or investigating mathematical learning in any way. After this screening, 188 studies remained.

In a third step, these studies were coded by the first two authors of this paper. During this step, studies that did not meet the following criteria were excluded: (a) Eye-tracking data were directly related to mathematics education. This applied if a study used eye-tracking data to analyze mathematical abilities, the solution of a mathematical task, or the acquisition of mathematical content.Footnote 2Mathematical was considered anything that is part of mathematics curricula; (b) the length of the article was at least three pages. This was necessary since shorter papers did not report enough information about the method to meaningfully review their use of eye tracking; (c) if both a conference paper and a journal article or book chapter reported the same data and analyses, we excluded the conference paper. This third step led to the exclusion of 79 studies.

A fourth and final step was to check the references within all studies that met the aforementioned criteria (i.e., steps two and three), which added another 38 studies. In addition, we included 14 studies from an additional manual search in the Conference Proceedings of the PME.Footnote 3

Eventually, 161 studies were included in this review. Of these, 31 were published in conference proceedings, 5 were book chapters, and the remaining 125 studies were published in journals. It is notable that the total number of studies included in this review was substantially higher than in prior reviews (Perttula, 2017: 28 studies; Mock et al., 2016: 45 studies; Lilienthal & Schindler, 2019: 33 studies).

3.2 Coding procedure

Codes were made according to the research questions in seven overarching categories.Footnote 4 With respect to RQ1a and RQ1b, we included the categories (1) publication (e.g., year, journal) and (2) domain and topic (e.g., domain, task type). RQ2 led to the categories (3) apparatus (e.g., manufacturer, sampling rate), (4) stimuli (e.g., task type, presentation, areas of interest), (5) sample and research design (e.g., sample size, procedure, statistical method), and (6) data treatment (e.g., event detection, statistical analysis). RQ3 led to the category (7) interpretation of eye movements (e.g., parameters, interpretation). If a categorization was considered unclear by the coder, a consensus was reached in a discussion between the first three authors of this paper. After the initial coding, all data were cross-checked for coherence, for example, in the description of the stimulus.

A table with selected information from all reviewed studies can be found in Appendix Table 3. The forthcoming results section is structured according to these seven overarching categories.

4 Results

4.1 Publication

To address RQ1a, we analyzed when and in which journals the reviewed studies were published. In line with previous reviews of eye-tracking research, we found a notable increase of studies in the period between 2006 and 2014 (see Fig. 1). Since 2014, the number of published studies was around 20 per year, meaning that 61% of the studies included in this review were published in 2014 or later. Only a small percentage (16%) of those studies that were journal articles were published in journals specialized in mathematics education. The mathematics education journal with the largest number of eye-tracking studies was the International Journal of Science and Mathematics Education (five studies). Most articles were published in journals focusing on psychology, educational psychology, or eye tracking. The most prevalent journals for the articles in the current review were Acta Psychologica (nine studies), Psychological Research (seven studies), and the Quarterly Journal of Experimental Psychology (seven studies).

Fig. 1
figure 1

Number of eye-tracking studies in mathematics education per year

4.2 Domains and topics

An overview of the domains and topics of mathematics education that were addressed by the studies reviewed here is given in Table 1. In the following, we briefly summarize these categories. For the specific tasks used in the studies, see Appendix Table 3.

Table 1 Domains and topics of mathematics education addressed by the studies included in the review

Numbers and arithmetic

The majority of studies included in the current review (90 studies; 56%) addressed numbers and arithmetic. Within this larger category, we grouped the studies into eight subcategories. Necessarily, these subcategories overlap, and their distinction will leave room for debate. The first subcategory included 16 studies (10%) investigating the perception, mental representation, and basic processing of single-digit numbers, number words, and non-symbolic numbers, including counting. The second subcategory included eight studies (5%) that used eye tracking to examine how participants represent and process multi-digit numbers. The third subcategory included 18 studies (11%) where eye tracking was used to investigate spatial-numerical associations (SNAs; Dehaene, Bossini, & Giraux, 1993) in the context of the mental representation of numbers and magnitude (e.g., the mental number line) and mathematical operations (e.g., the operational momentum effect; Klein et al., 2014). SNAs are assumed to emerge in a variety of numerical cognitive processes (for a critical review, see Shaki & Fischer, 2018). The fourth subcategory included nine studies (6%) that analyzed performance and strategic processes on a number line estimation task. In a fifth subcategory, 23 studies (14%) examined cognitive processes during basic arithmetic operations with Arabic numerals. For example, studies in this subcategory focused on eye movements during equation-solving (e.g., Verschaffel, De Corte, Gielen, & Struyf, 1994) or more complex arithemtic operations like multiplication (e.g., Ganor-Stern & Weiss, 2016). The sixth subcategory included five studies (3%) focusing on the development of number perception and number magnitude by using eye tracking with infants. Finally, the seventh subcategory involved the processing of rational numbers and proportionality, which was composed of 12 studies (7%). Most of these studies analyzed how people process the numerical values of rational numbers, predominantly fractions, although Plummer et al. (2017) included decimals and visual (rather than symbolic) representations of rational numbers.

Geometry, shape, and form

The close connection between geometrical thinking and visual perception makes eye tracking a suitable and pertinent method for research that analyzes the perception and processing of geometrical objects. In total, 22 studies in this review (14%) analyzed tasks or abilities in this area. This included geometric proofs, analyzing mental rotation tasks, construction of geometric objects, the perception of objects in a Cartesian plane, geometric calculations in dynamic problems, and the processing of vector fields and geometric shapes.

Reading and word problem solving

Studies on mathematical reading and word problem solving comprised 21 (13%) of the studies reviewed here. One of the oldest and most common uses of eye tracking is in research on reading (Rayner, 1998; Rayner, Pollatsek, Ashby, & Clifton, 2012). Consequently, the first studies using eye tracking in mathematics education focused on reading of mathematical texts or word problems. Terry (1921) analyzed fundamental characteristics of reading in mathematics, which he related to the first models of eye movements during reading. This association was also investigated by nine studies that analyzed eye movements during prototype word problem solving. Apart from Terry (1921), six studies considered more complex word problems and longer passages of mathematical text, for example, the integration of illustrations (Dewolf et al., 2015), formulae (Kohlhase et al., 2018), or figures (Lee & Wu, 2018). In addition, basic processes in decoding and parsing of mathematical language were analyzed in five studies.

Reasoning and proof

Fourteen studies (9%) addressed mathematical reasoning and proof. This included studies investigating eye-movement patterns during the reading and validation of proofs or during proportional reasoning. Furthermore, studies in this category addressed probabilistic reasoning, mathematical logic, and functional thinking. Note that studies on reasoning and proof in geometry were included in the category geometry.

Use of mathematical representations

In general, the role of representations and multimedia in learning is subject to numerous studies in research using eye tracking (see Hyönä, 2010; Lai et al., 2013; Mayer, 2010; van Gog & Scheiter, 2010, for overviews). With respect to the present review, 14 (9%) of the studies investigated the role of mathematical representations. In mathematics, information is commonly represented in three different ways: formulae, graphics, and texts. Using eye tracking, Andrà et al. (2009) and Andrà et al. (2015) showed that these representations differ fundamentally in the way that information is retrieved. Several studies analyzed differences in presentation format in various domains: fractions (Atagi et al., 2016), argumentation and proof (Beitlich et al., 2014; Beitlich et al., 2015), propositional logic (Ott et al., 2018), problem solving (Rozek et al., 2014), multiplication (Bolden et al., 2015), word problem solving (Dewolf et al., 2015), and geometry (Lee & Wu, 2018).

Learning difficulties

Eye tracking has been used to characterize and analyze learning disabilities in mathematics. These eight studies (5%) compared typically developing students and students with dyscalculia (Moeller, Neuburger, et al., 2009; van der Weijden et al., 2018; van Viersen et al., 2013), Down syndrome (Abreu-Mendoza & Arias-Trejo, 2015), autism (Winoto et al., 2017), developmental coordination disorder (Gomez et al., 2017), and general mathematical learning difficulties (Schindler & Lilienthal, 2018; van’t Noordende et al., 2016).

Computer-supported learning

Six studies (4%) used eye tracking to analyze learning processes in a computer-supported learning environment. Kiili et al. (2014) and Hernandez-Sabate et al. (2016) used eye tracking to analyze students’ attention to different features of educational mathematics games. Other studies evaluated the usability and effectiveness of learning software, namely, Cinderella (Schimpf & Spannagel, 2011), e-Proof (Roy et al., 2017), and GeoGebra (Yağmur & Çakır, 2016). Olsen et al. (2017) evaluated a collaborative tutoring system using eye tracking by examining joint visual attention.

Mathematical problem solving

Mathematical problem solving was examined in four studies (2%), investigating relations between eye movements and objective/subjective task difficulty (Andrzejewska & Stolinska, 2016), insight problem solving (Knoblich et al., 2001), and the association of body movements and problem solving processes, assuming effects of embodied cognition (Werner & Raab, 2014). Haataja et al. (2018) used a collaborative problem-solving task to investigate a teacher’s attention during scaffolding.

Statistics

Three studies (2%) focused on statistics. Each study analyzed the interpretation of statistical data, addressing misconceptions in contingency (Fleig et al., 2017) and Bayesian reasoning (Cohen & Staub, 2015), and difficulties in interpreting histograms (Boels et al., 2018).

Affective variables

Affect reflects an important aspect of learning in general and of mathematics education in particular (e.g., Goldin et al., 2016). Two studies (1%) related eye movements to mathematics-specific affective variables, namely mathematical self-concept (Strohmaier et al., 2017) and mathematics anxiety (Hunt et al., 2015).

4.3 Apparatus

Most of the recent studies (since 2014) in our review used equipment from one of three manufactures (see Holmqvist et al., 2011 for an introduction of the manufacturers): SR Research (Ottawa, Canada; 33%), Tobii (Danderyd, Sweden; 29%), or SMI (Potsdam, Germany; 25%). SR Research models have typically provided a higher sampling rate of 500 to 1000 Hz, which might be necessary to compare temporal measures and reaction times.

Eye trackers are commonly distinguished in two categories (Holmqvist et al., 2011). Static systems are attached to the stimulus, which is typically presented on a computer monitor. The second category are head-mounted devices, where the eye tracker is attached to a head mount or integrated in eye-tracking glasses. Head mounted systems can record data on portable devices like mobile phones. When they do not require a physical connection to a static recording device, we refer to these systems as mobile. In this review, 132 studies used a static eye tracker, 19 used head-mounted systems, 8 of which were mobile.

Depending on the set-up, the accuracy of the apparatus varies notably (Holmqvist, Nyström, & Mulvey, 2012). Using current technology, a set-up that allows for free head movements typically provides accuracies of 0.5° to 1° of visual angle. Accuracy can be increased to about 0.1° by fixating participants’ heads, for example, with a chin rest. However, depending on the design of the stimulus, such precision might not be necessary as the area of good visual acuity extends to about 2° of visual angle (Rayner, 1998). Only 24% of the studies in this review explicitly reported that participants’ heads were free-moving. In 37% of the studies, a head rest, head frame, chin rest, or bite bar was reported or could be inferred from the apparatus type. However, in many studies (39%), no information regarding movement restrictions was reported or could be inferred.

There are a number of different techniques to record eye movements (see Duchowski, 2007, for an overview). The oldest study reported here (Terry, 1921) used the reflection of an ordinary light beam onto a spooling film roll to record eye movements in one dimension (details of this apparatus are reported in Gray, 1917). In the present review, 151 (94%) of the studies used a video-based pupil/corneal reflection technique. For this technique, one or more cameras record an image of the eye with an infrared light source located next to the camera. Two reference points from the recorded image are compared to determine the orientation of the eye. The first reference point is the reflection of the infrared light source on the cornea, and the second reference point is the center of the pupil. The positional difference between these reference points changes with the rotation of the eye and can be translated to the gaze position (see Duchowski, 2007, for a detailed description).

n four studies, eye movements were manually coded by a human observer during the experiment (Moutsios-Rentzos & Stamatis, 2013, 2015), from video recordings (Canfield & Smith, 1996) or both (Macchi Cassia et al., 2016). Three studies used a search coil technique, which determines the gaze position by a contact lens containing a thin wire, which is then detected by a magnetic field surrounding the participant (Duchowski, 2007). A further two studies used electrooculography (EOG), which makes use of a natural, small electric potential between the front and the back of the human eye to detect its movements (Hamada & Iwaki, 2012; Zhou et al., 2012). EOG and search coils are usually used to increase accuracy, while qualitative observation of eye movements is especially advantageous with infants and in authentic learning environments. For one study, it was not possible to identify the eye-tracking technique (Fry, 1988).

Most studies reported details about the stimulus presentation (e.g., monitor size, resolution, refresh rate, or viewing distance). However, information about the eye-tracking apparatus itself was often incomplete: 24% of the studies did not report the sampling rate and 29% did not mention a calibration procedure and if they did, only 12% of these studies reported the accuracy required for accepting a calibration. This information is crucial for interpreting the data since the accuracy of the apparatus and a typical calibration error can easily amount to errors of up to 2° of vision, which corresponds to about 2.5 cm on the screen in a regular, static set-up. Depending on the stimulus design and the required precision, this can negatively affect data quality.

4.4 Stimuli

The design of the stimuli was usually characterized by a trade-off between methodological considerations and authenticity. Some studies used simplified stimuli containing few elements and took technical details, such as background color, font, or the synchronization of sampling rates of the stimulus monitor and the eye tracker into careful consideration. Other studies, in contrast, focused on educational authenticity and were more concerned with creating stimuli that were as close to real-world or school mathematics as possible. This included the implementation of dynamic stimuli (e.g., Canfield & Smith, 1996; Schimpf & Spannagel, 2011; Shayan et al., 2017), tablet computers (Abrahamson et al., 2016; Shayan et al., 2017; Yağmur & Çakır, 2016), material from textbooks (Beitlich et al., 2014; Kohlhase et al., 2018; Lee & Wu, 2018), or pictures (e.g., Dewolf et al., 2015). Studies using head-mounted eye tracking sometimes used realistic, paper-based stimuli (Schindler et al., 2016) or authentic classroom interaction (Haataja et al., 2018; Hannula & Williams, 2016). However, these studies needed to take into account that head-mounted eye tracking typically provides a lower resolution and sampling rate than static eye trackers. Moreover, because the eye-tracking device is not attached to the screen with the stimulus, the gaze coordinates have to be mapped onto the stimulus using a second camera which records the participant’s field of view. This requires a significant amount of computation and adds additional sources of inaccuracy.

For analyses in 101 papers, Areas of Interest (AOIs) were defined. These are predefined areas of the stimulus that are used for analyzing eye movements on specific elements of the stimulus, such us representations, target words, or keywords (Holmqvist et al., 2011). Importantly, the size of the AOIs influences many eye-tracking measures. For example, the number of fixations will typically increase in larger AOIs (Holmqvist et al., 2011). Many of the studies that made use of AOIs (49%) avoided this issue by using AOIs that had the same size. However, AOIs with the same size do not necessarily contain the same amount of relevant information. In other cases, AOI data were not compared with each other but between trials or participants (e.g., Bolden et al., 2015; Roy et al., 2017). If a standardization was necessary, for example, to compare eye movements on pictures and text that did not have the same size, some studies standardized the AOI sizes by their area (e.g., Alqassab et al., 2018; Beitlich et al., 2015) or used measures that were less strongly affected by AOI dimensions (e.g., revisits, Hegarty et al., 1995; time to first fixation, Bulf et al., 2014).

4.5 Sample and research design

All studies included in this review, except for three (Olsen et al., 2017; Shvarts, 2018a, 2018b), tested participants individually. The 161 studies included a total of 189 experiments. On average, each experiment included a sample of M = 28.56 (SD = 21.70) participants. The majority of studies (59%) included participants from tertiary education, while only 28% included participants form primary or secondary schools.

Of the studies reviewed here, 54% used a within-subject design or a mixed design. Studies that used between-subject designs (22%) usually compared specific populations, for example, a particular age, achievement, or expertise.

4.6 Data treatment

Eye-tracking instruments provide raw data about eye movements, usually in the form of coordinates. Since perception depends on the specific nature of the eye movements (Matin, 1974), it is theoretically important for many research questions to manually (by inspecting visualizations of raw data) or automatically (through an automated algorithm) categorize the raw data into events, typically fixations, saccades, and blinks. During saccades and blinks, almost no information is processed, but these events can account for 5 to 15% of the raw data (for typical event durations, see Holmqvist et al., 2011). There are several established automatic event detection algorithms used in eye-tracking research.Footnote 5 In most eye-tracking analysis software, these algorithms are applied automatically, but thresholds can be modified. The selection of both the algorithm and the thresholds influences the filtered data and thus potentially the results of a study (Blignaut, 2009). Therefore, they should be reported in publications. However, 60% of the studies included in this review did not report an event detection algorithm, and only 21% reported thresholds.

Data loss is a critical issue in studies using eye tracking (Holmqvist et al., 2011) and was substantial in many studies included in this review. When studies reported participant exclusion, this affected an average of 15% of the total sample size. Data loss seems to be especially common in studies with very young or old participants. Lécuyer et al. (2004) used data only from 12 of their 50 four-month-old infants (a data loss of 76%). Ischebeck et al. (2016) had to exclude 21% of their 6-year-old participants, but only 3% of their 8-year-olds in the same experiment. Similarly, Watson et al. (2005) lost about twice as much data in their group of 57- to 79-year-old adults compared with their group of 18- to 25-year-old students.

In a majority of studies reviewed here (76%), eye tracking was not the only source of data in the experiments. In half of these studies, other data were analyzed in relation to eye movements (50%), while the other half analyzed the additional data independently. Frequently, measures like reaction time and accuracy were analyzed in parallel. In general, if data types were analyzed in relation to each other, it was often the relation between eye movements and accuracy that was analyzed. Richer triangulations were done with gestures and communication (Hannula & Williams, 2016; Shvarts, 2018a, 2018b), interviews and stimulated recall (Klein et al., 2018; Shayan et al., 2017; see also Schindler & Lilienthal, 2019), think-aloud protocols and self-reports (Cimen & Campbell, 2012; Green et al., 2007; Ögren et al., 2017; Schindler & Lilienthal, 2018), cognitive load (Lin & Lin, 2014b), affective variables (Hunt et al., 2015; Strohmaier et al., 2017), or skin conductance and EEG (Muldner & Burleson, 2015).

4.7 Interpretation of eye movements

Eye tracking offers possibilities for qualitative and quantitative research. Of the reviewed studies, 66% used a quantitative approach, 22% used a qualitative approach, and 11% used a combination of qualitative and quantitative approaches.

Interpreting eye-tracking data is not straightforward. Lai et al. (2013) and Holmqvist et al. (2011) provide an overview of the most common measures of eye movements in educational research as well as the theoretical interpretations of these measures. Crucially, eye-tracking measures can be interpreted in various ways. In many cases, the same measure can indicate different cognitive processes (Holmqvist et al., 2011). Therefore, this section is structured by the interpretations of eye movements suggested in the reviewed studies, which are listed in Table 2. In this section, we give the most common measures of eye movements that were used for each interpretation and exemplary studies for each measure. For a list of the measures used in each study and their associated interpretations, see Appendix Table 3.

Table 2 Interpretations of eye movements used by the studies included in the review

Visual focus and overt attention, eye mind assumption

Although the studies presented here are diverse in their research interests, the link between eye movements and cognitive processes in mathematics education often builds on similar theoretical considerations. The most common interpretation of eye movements is based on the eye mind assumption (EMA) formulated by Just and Carpenter (1980). It initially stated that “the eye remains fixated on a word as long as the word is being processed” (p. 330) and has since been interpreted as a more general rule of visual focus equals cognitive focus. Recent research showed that this assumption is a strong simplification and indicated that it does not hold rigorously (for a detailed review of research investigating the association between vision and attention, see Carrasco, 2011; see also Duchowski, 2007; Schindler & Lilienthal, 2019). In general, only so-called overt attention can be directly observed through the position of the visual focus. However, attention can also be shifted without moving the eyes, which is referred to as covert attention (Carrasco, 2011). Even though overt and covert attention do overlap in the majority of cases (Carrasco, 2011), Schindler and Lilienthal (2019) showed that eye movements and self-reports of the attentional focus often diverge, which indicates that the EMA might not hold under certain circumstances. While ambiguity may not be a reason to discard the assumption entirely, it calls for researchers to be aware of the limits of its interpretation. In the studies reviewed here, the majority (60%) interpreted eye movements in accordance with the EMA.

A fundamental question that can be addressed by analyzing eye movements is when, whether, and how much a single aspect of a visual stimulus is attended to by a participant (4% of the studies interpret eye movements this way). The onset of attention can be measured, for example, by the time to first fixation (Schimpf & Spannagel, 2011) and the first fixation position (Ruiz Fernández et al., 2011). It should be noted that these measures can also indicate covert attention indirectly, since it is usually the driving force behind the initiation of fixations (Rayner et al., 2012). The amount of attention to objects or areas of the stimulus is often measured by the number of fixations on the object (Dewolf et al., 2015) or by inspecting visualizations such as heat maps or scan paths (also referred to as gaze paths, Winoto et al., 2017; scan patterns, or gaze sequences, Holmqvist et al., 2011). For example, Dewolf et al. (2015) evaluated whether students attended to illustrations in word problem solving by measuring the number of fixations on the illustrations. This approach is also popular in evaluating computer-supported learning, since eye tracking can be used to assess whether students notice certain elements of the learning environment (Kiili et al., 2014; Schimpf & Spannagel, 2011). Critically, attention to a certain element can have two fundamentally different causes: Lin & Lin (2014b) argue that high-performing students spend less time and fixations on important areas because they extract the information faster (see also Gegenfurtner, Lehtinen, & Säljö, 2011). At the same time, other researchers found that experts can better differentiate between relevant and irrelevant information and therefore spend relatively more time on relevant than irrelevant information (e.g., Fleig et al., 2017; Kim et al., 2018).

Similar to examining single aspects in a visual stimulus, and based on the EMA, the attention allocated to different parts of a visual stimulus can be compared. Out of the studies included in this review, 17% use this interpretation. The most popular measures for this comparison are the relative number of fixations and the relative fixation duration. For example, Ott et al. (2018) compared the number of fixations and total fixation duration between text and formulae and text and graphics, respectively. Furthermore, measures can be compared between different periods of time: De Corte et al. (1990) compared fixation durations on numbers and words between a participant reading the word problem for the first time and the consecutive reading process.

In addition, 25% of the reviewed studies analyzed attentional patterns of eye movements, which are often associated with solution strategies. For example, the number and order of transitions between certain aspects of a stimulus was used to assess parallel compared with sequential strategies in number processing (Merkley & Ansari, 2010; Meyerhoff et al., 2012), fraction comparison strategies (Miller Singley & Bunge, 2018; Obersteiner et al., 2014; Obersteiner & Tumpek, 2016), or information integration processes (Alqassab et al., 2018; Crisp et al., 2011; Ögren et al., 2017). As another measure, the position of the first fixation was considered an indicator for a preferred order of information processing (Michal et al., 2016). Moreover, saccade length was used as an indicator for local (short saccades) compared to global (long saccades) strategies in information retrieval (Inglis & Alcock, 2012; Klein et al., 2018; Stolinska et al., 2014) and information integration (Godau, Wirth, et al., 2014).

When a quantitative description of strategies was not possible, a number of studies (14%) used analyses based on visualizations like scatterplots (Inglis & Alcock, 2018), heat maps, scan paths, or replays of the eye-tracking recording to manually detect patterns of eye movements and associated strategies (e.g., Lee & Wu, 2018). This approach was often limited to smaller sample sizes as it requires more time than computerized analyses.

Mental representation and covert attention

Other than eye-tracking measures based on the EMA, 24% of the studies used eye movements as more direct indicators of mental representations (e.g., Klein et al., 2014) and processes (e.g., Hamada & Iwaki, 2012). This approach was found especially in studies on SNAs and number line tasks, where the position of fixations was assumed to map onto a mental, spatial representation of numbers (8%). Additionally, nine studies used the time to the first fixation on a target as a measure of reaction time. By placing the targets in different areas of the visual field, SNAs could be observed (e.g., Schwarz & Keus, 2004). Moreover, some studies associated imperceptible eye movements like microsaccades as a motoric indicator of cognitive representations (four studies; e.g., Myachykov et al., 2016).

Another interpretation was the association between eye movements and brain hemisphericity during mathematical tasks, proposed in two studies (Moutsios-Rentzos & Stamatis, 2013, 2015).

Other studies investigated areas of covert attention, i.e., when the focus of attention is not equivalent to the visual focus, through eye movements. For example, the position of the first fixation on a stimulus was interpreted to map on the previous focus of covert attention (three studies; e.g., Risko et al., 2013). Attentional anchors, which are goal-oriented perceptual structures in the sensory field that enable better coupling with the environment, were located through gaze patterns (four studies; e.g., Duijzer et al., 2017). For infants, the gaze time and saccade latency were interpreted as indicators for the mental anticipation of objects (four studies; e.g., Canfield & Smith, 1996).

Cognitive effort and resources

A final set of interpretations of eye movements was based on the assumption that eye movements are direct indicators of the cognitive effort involved in decoding visual information. This more fundamental interpretation was used by 17% of the reviewed studies. For instance, the mean duration of fixations was used as an indicator for mental workload or cognitive effort and depth in processing information (e.g., Hodds et al., 2014). Moreover, the number of revisits on specific elements, the total fixation duration, or the time to the first fixation were interpreted as an indication of memory capacity (three studies; e.g., Watson et al., 2005).

Joint attention, metacognitive control

The three studies making use of dual eye tracking (i.e., two participants’ eye movements being tracked in parallel) used the proximity of learners’ fixations as a measure of joint attention (e.g., Shvarts, 2018a). Two studies associated eye movements with metacognition, using the blink rate (Cimen & Campbell, 2012) as well as scan paths and the total fixation duration (Cohors-Fresenborg et al., 2010).

5 Discussion

This review investigated the use of eye tracking in the field of mathematics education research, addressing three research questions. We discuss the results in the order of these questions.

5.1 Overview of the use of eye tracking in mathematics education

In accordance with our first research question, we provided an overview of the domains and topics addressed in mathematics education research using eye tracking, and when and how these studies were published. Our findings illustrated that the number of studies in mathematics education that made use of eye tracking has increased rapidly in the last decade and continues at a rate of around 20 studies published per year. This illustrates the ongoing popularity and importance of eye tracking within mathematics education. Our results showed that eye tracking was used in a wide range of fields within mathematics education, although a majority of studies focused on numbers and arithmetic and fundamental processes of mathematical thinking like number perception, counting, and basic arithmetic operations. These studies were often conducted in controlled and systematically designed laboratory experiments and aimed to precisely assess specific cognitive processes that were often impossible to investigate through other methods. However, it is noteworthy that our review also included a variety of studies that went beyond strictly controlled laboratory settings to include authentic learning situations (e.g., Hannula & Williams, 2016; Kiili et al., 2014; Lin & Lin, 2014b). Here, eye tracking benefited twofold; it allowed for relatively authentic learning environments, and it was an unobtrusive method to gather data about learning processes. In sum, it seems clear that the possibilities offered by the method of eye tracking are diverse in the domain of mathematics education, with this method being adaptable to the various subdomains. Recent developments like mobile or dual eye tracking indicate that the method will continue to evolve in the future, particularly in the direction of more authentic experimental settings. However, the domains found in our review did not cover all mathematical topics in the same depth. Thus, a wealth of opportunity remains for future studies where eye movements could be informative, for example, in the field of statistics, where only three studies were found (see Boels, Bakker, & Drijvers, 2019, for recent developments).

It should be noted that the scope of this first research question was not to summarize the specific research goals of each study but rather to provide an overview of the topics addressed within these goals. Accordingly, our review analyzes the research from a methodological perspective and does not scrutinize the specific results and implications that each study offers in its respective field. We acknowledge that this is a narrow focus that omits findings and consequences of these studies. However, we hope that the overview of domains and topics will guide the reader to a more detailed investigation of the consequences of the specific studies.

5.2 Eye-tracking methodology

To answer our second research question, we reviewed technical and methodological aspects of eye-tracking studies in mathematics education. In terms of research design, most studies used within-subject or mixed designs, which allowed for small sample sizes with large interindividual variance. At the same time, many studies revealed difficulties due to a substantial loss of data, which might be problematic if sample sizes are already small. As for the age of participants, most studies included university students who arguably often represent a convenience sample. Because many research questions in mathematics education are related to school-aged children, the predominance of university students in eye-tracking studies in mathematics education seems problematic. Age is an important factor affecting eye movements (Holmqvist et al., 2011; Rayner et al., 2012) meaning that the generalizability of findings from adults might be limited. Moreover, university students typically reflect a high-achieving selection of young adults. For these reasons, the issue of generalizability should be acknowledged and systematically addressed in future research.

In addition to research design and participant characteristics, we examined the use of the eye-tracking method in the reviewed studies. Although studies necessarily vary in the specific eye-tracking method they use, we found large inconsistencies in the reporting of these methods. Because eye-tracking research offers a variety of options in terms of the apparatus itself, the settings of the apparatus, and the specific data analysis methods, it seems especially important to report all necessary details with regard to eye tracking (see also Holmqvist et al., 2011). Full reporting is also crucial for other researchers to understand or replicate the specific analysis of eye movements and to evaluate the implications of the findings. We propose that this should include (but not be limited to) a precise description of the apparatus including sampling rate and average accuracy; the existence or nonexistence of movement restrictions and information regarding the setup; the size of the stimuli; the distance between the stimuli and the participant; the monitor’s refresh rate; the calibration procedure and calibration accuracy threshold; the event detection algorithm and event detection thresholds; the position and size of any AOIs; the correlation between all used measures; and the amount of and reasons for data loss.

5.3 Interpretation of eye movements

Finally, our third research question aimed to assess how eye-tracking data were interpreted in the reviewed studies. It became clear that eye tracking not only offers a wide range of possibilities for qualitative and explorative analyses but also provides data suitable for various quantitative analyses. In general, most eye-tracking studies in mathematics education claimed that the method allows for the assessment of cognitive processes that would otherwise not be observable, for example, because they are subconscious. One of the most crucial challenges in eye-tracking research is to properly link eye movements to these assumed underlying cognitive processes. Although reflecting about this link seems obvious, it is by no means ubiquitous in the studies reviewed in this paper. Even when studies analyzed similar cognitive processes, they often made use of numerous or redundant measures of eye movements. Eye-tracking measures can be highly correlated with each other for theoretical or computational reasons, but correlations are scarcely reported. As an example, Merkley and Ansari (2010) reported the correlation between the number of fixations and the number of saccades, which was r = .997. This is not surprising as saccades and fixations alternate in regular reading and event detection algorithms typically infer one from the other (Salvucci & Goldberg, 2000). In such a case, the two measures arguably cannot reflect different cognitive processes and researchers should consider using only one of the two measures for analyzing the same research question (e.g., Hurst & Cordes, 2016).

Eye movements alone are seldom informative when the reader is not properly guided on their meaning and relation to the research questions. Thus, eye-tracking studies—as all other experiments—arguably require a plan on data interpretation before they are conducted, and possible implications gained from the data should be anticipated in advance, even in explorative studies. Apart from avoiding the risk of type I errors and post hoc hypotheses, limiting and specifying the to-be-used measures further helps to clarify, compare, and interpret the results (for a discussion of some of these considerations, see Banks et al., 2019).

We found that a majority of studies referred to the EMA as a theoretical foundation for the interpretation of eye movements. The EMA itself is, however, not universally accepted in eye-tracking literature, and there are limitations within the domain of mathematics education as well (Anderson, Bothell, & Douglass, 2004; Rayner, 1998; Schindler & Lilienthal, 2019). Schindler and Lilienthal (2019) addressed the case of a student solving a geometry problem. Using stimulated recall, the authors identified incidents in which the EMA seemed to hold—and others in which the EMA was seemingly violated. Even in incidents in which the EMA did hold, the same eye-movement pattern could have been linked to a variety of cognitive processes. The EMA is arguably a feasible tool for the interpretation of a majority of eye movement data, especially in settings in which participants have to work on visual problems within a limited time. But its suitability should be evaluated from case to case. Triangulation with other data sources could help to interpret eye movements meaningfully.

Our review shows that eye tracking was not limited to detecting the visual focus of attention following the EMA. Rather, eye tracking has also been used to assess, among other things, mental representations, cognitive workload, or joint attention in collaborative learning settings. For these processes, eye tracking provides specific benefits: For example, the relationship between mental representations and eye movements is often very strong and can be best observed during the work on the task and not after the task (e.g., during counting; Hartmann, Mast, & Fischer, 2016).

The amount of different measures that were used in the studies reviewed here illustrates not only the many possibilities that eye tracking provides but also the possible difficulties in comparing studies with regard to the measures used and their interpretation. It would be helpful if future research would make stronger use of previous studies and comprehensive guides when choosing eye-tracking measures. For example, Holmqvist et al. (2011) and Lai et al. (2013) offer overviews of measures of eye movements and their interpretations in general, and we provide an overview of the measures used in the mathematics education literature (Appendix Table 3).

The majority of studies interpreted eye-tracking measures in isolation, even if other behavioral or self-reported variables were assessed in parallel. If such other data are collected, researchers could analyze and report their relationship to eye-tracking measures. However, this was not common practice in the studies reviewed here. The use of think-aloud protocols, interviews, reaction times, or accuracy in relation to eye-tracking data can help to verify and specify the interpretations of eye movements. For example, including stimulated recall or interview data can provide additional indications if assumptions like the EMA are valid in specific experiments and help to decide how the collected data can or cannot be interpreted (e.g., Schindler & Lilienthal, 2019).

5.4 Three benefits of using eye tracking in mathematics education

Eye tracking has become a prominent method in mathematics education research. However, many, if not all, of the topics covered in the papers in this review had previously been examined without eye tracking. Thus, what added value can the method provide? Based on the studies reviewed here, we argue that eye tracking offers unique ways to understand cognitive processes in mathematics education. In many studies reviewed here, eye-tracking measures provided information that could otherwise not be collected. This was usually the case for one of three reasons:

  1. a)

    The research referred to a time-critical process rather than an outcome. Mathematical tasks often provide a variety of solution approaches and strategies. These strategies are often not visible in the final solution of a task. Moreover, given enough time, students might use strategies and approaches that are sufficient, but not optimal. Thus, observing solution processes without interrupting students is a major challenge that can be tackled through eye tracking (e.g., Inglis & Alcock, 2012; Obersteiner & Tumpek, 2016; van der Weijden et al., 2018).

  2. b)

    The research included aspects of visualization and mental representations. These questions are typical for mathematics education since mathematics makes use of visualizations in many forms, but at the same time, mathematical objects are often abstract. Making mental representations of these objects visible is a general challenge that can be approached through eye tracking (e.g., Hartmann, Mast, & Fischer, 2016; Myachykov et al., 2015; Risko et al., 2013).

  3. c)

    The research referred to cognitive processes that cannot be consciously reported. In mathematical thinking, cognitive processes are often complex and therefore hard or impossible to communicate, particularly for younger students. Other representations or cognitive biases might not even be consciously accessible but are nevertheless reflected in eye movements (e.g., Moeller, Neuburger, et al., 2009; Ott et al., 2018; Watson et al., 2005)

When studies further provide a sound and precise theoretical association between eye movements and cognitive processes (e.g., Alqassab et al., 2018; Curtis et al., 2016; Plummer et al., 2017), they usually offer an immediate and unique insight into mathematical thinking. Although a decisive judgment on the specific value of these findings within all areas of mathematics education was beyond the scope of this review, many authors were convinced that eye tracking did offer unique insights that would not have been possible with traditional methods (e.g., written tests or think-aloud protocols) and that brought forward their specific field of research.

6 Conclusion

Eye tracking has the potential to allow novel insights into mathematical thinking and learning. In order to make effective use of this potential, future research should strive for more clarity regarding the theoretical foundations underlying the research questions being addressed and the methodological choices being made. Moreover, the interpretation of eye movements should be based on a reasonable assumption of what eye movements measure and what cognitive processes these measures reflect. Considering the large body of studies that already use eye tracking in mathematics education, it is our hope that this review can guide future researchers in this field and support them in using eye tracking in an efficient and reflective way.