1 Introduction

Eye tracking (ET) techniques, which capture eye movements of persons looking at certain stimuli (Chen, 2011), have gained increasing attention and popularity in mathematics education research in recent years (Andrá et al., 2015). The availability of affordable ET devices has made this technology more accessible and has fueled interest within mathematics education research. ET facilitates studying what information persons are attending to while solving problems (Andrá et al., 2015), and it has been used for studying students’ problem solving in different mathematical subdomains (e.g., Obersteiner & Tumpek, 2016). ET is considered especially beneficial in geometrical settings where visual information is provided, perceived, and processed (e.g., Epelboim & Suppes, 2001; Muldner & Burleston, 2015).

However, even in geometry tasks, the interpretation of ET data is non-trivial. It is typically carried out assuming the eye-mind hypothesis (EMH, see Just & Carpenter, 1980); that the eyes fixate on what the mind is processing. The EMH holds true especially for reading research, where it was originally developed, assuming that “the eye remains fixated on a word as long as the word is being processed” (Just & Carpenter, 1980, p. 330). However, it is unclear whether the EMH holds for students’ problem solving in mathematics as well. We see the risk that more or less implicit assumptions are tacitly brought over to the domain of mathematics. Persons may gaze at a point in space without registering the corresponding object; or recall objects they never fixated on (Holmqvist et al., 2011). Moreover, previous studies have indicated that mapping ET data to mental processes is not bijective in general: Fixations may, for example, indicate difficulties to extract information (Jacob & Karn, 2003), heightened cognitive attention (Andrá et al., 2015), mental calculation (Hartmann, Mast, & Fischer, 2015), or bored staring.

This article examines the types of inferences that can be made from ET data involving geometry problem solving. Its aim is to investigate to what extent the EMH holds in geometry and the question of whether a refinement of the use of the EMH and the according ET data analysis are needed. In particular, we ask the following questions:

  1. 1.

    What deviations from the EMH do occur in the domain of geometry?

  2. 2.

    If the EMH holds: What eye movements and what accompanying mental processes can be identified?

  3. 3.

    If the EMH does not hold: What eye movements and accompanying mental processes can be identified that are still useful for a domain-specific interpretation?

We extract our findings from a case study: In a stimulated recall interview (SRI), a student watched a gaze-overlaid video of himself solving a geometry problem (see Schindler & Lilienthal, 2019a, for an example) and he was asked to describe and explain his thoughts (Schindler & Lilienthal, 2017). The results show the need for a refined approach towards ET data analysis that does not naïvely draw on the EMH, but interprets eye movements context sensitively (as suggested by Hayhoe, 2004). We show that not only the instances where the EMH holds but also those where it does not hold may provide valuable information on students’ problem solving in mathematics. Our insights constitute a first step towards developing domain-specific interpretation theories in the field of mathematics education, which may reduce the inherent ambiguity and uncertainty analyzing ET data in this domain (see Hayhoe, 2004).

2 Related work and theoretical considerations

2.1 Eye tracking in mathematics education research

2.1.1 Literature review

ET as used in this work involves two processes: capturing eye movements and determining the gaze points in a video that shows the participant’s field of view. Only the projection onto the observed scene (e.g., the task sheet) allows relating ET data to semantics of visual stimuli (e.g., angles or lines).

ET is increasingly used in mathematics education research. In a literature search we conducted based on a MathEduc DatabaseFootnote 1 query, this was confirmedFootnote 2: Whereas the number of ET-related papers in mathematics education is generally low as compared to other topics (e.g., “fraction” or “function”) and to other methods (e.g., “interview”), the number of ET studies has increased recently.Footnote 3 Furthermore, we perceive a trend regarding the mathematical topics addressed. ET studies in the 1980s and 1990s often addressed word problems (e.g., de Corte & Verschaffel, 1986; Fry, 1988; Hegarty et al., 1992), corresponding to the early use of ET in reading research. In recent years, studies have increasingly concerned mathematical topics such as fractions (Obersteiner & Tumpek, 2016), representations of functions (graphs) (Andrá et al., 2015), visual perception of the coordinate system (Chumachenko, Shvarts, & Budanov, 2014), geometry or spatial problem solving (Chen & Yang, 2014; Lin & Lin, 2014a), or proportion (Abrahamson & Bakker, 2016).

The increased attention and popularity of ET as research method as well as the observed shift of interest towards mathematical subdomains call for a scientific discussion of how to interpret eye movements in these subdomains of mathematics and, thus, motivate this study’s alignment towards interpretation of ET data.

2.1.2 Theoretical underpinning of ET data interpretation and analysis

ET data are versatile and rich in information (Holmqvist et al., 2011), yet not self-explanatory—they must be interpreted. In most ET research studies, the interpretation rests on the EMH. Just and Carpenter (1976) summarize:

The critical assumption that underlies our analysis is that under certain circumstances, the eye fixates the referent of the symbol currently being processed. (…) If a number of symbols are processed in a particular sequence, then their referents should be fixated in the same sequence, and the duration of gaze on each referent may be proportional to how long the corresponding symbol is operated on. (p. 139)

Even though there are studies in areas such as neuro-psychology that address limitations of interpretation of ET data, studies in mathematics education tend to adopt the EMH: Here, often “eye movements are assumed to correspond to mental operations” (Obersteiner & Tumpek, 2016, p. 257) and cognitive processes are “inferred” from gaze patterns without further explanation or reflection about the relation of mind and eye.

Most researchers assume—more or less tacitly—what Radford describes as “classical mental-oriented views of cognition” (p. 111). Eye movements are, for instance, assumed to indicate internal cognitive processes (e.g., König et al., 2016), which can—following Gerrig (2013)—be understood as “higher mental processes, such as perception, memory, language, problem solving, and abstract thinking” (p. 207). Yet, other researchers take an embodied perspective on eye movements (e.g., Abrahamson & Bakker, 2016). In this perspective, eye movements are not understood to reflect mental processes, but to be an expression of sensorimotor functions (see Radford, 2009), and thus, an integral part of cognition. Vision is moreover understood to entail a fusion of sensations (de Freitas, 2016), and thinking is understood to be multimodal (Radford, 2009). Irrespective of the paradigm—whether one takes a rather psychological or rather enactivist, embodied perspective—the need to clarify to what extent the EMH applies—whether eyes and thinking are well coordinated—and to investigate what can be inferred from eye movements is relevant and significant on a disciplinary level in mathematics.

In our study, we study mental processes and accompanying eye movements. This includes all accessible mental processes, which may be conscious or even unconscious; controlled or automatic (Bargh & Ferguson, 2000), lower level (e.g., sensory or motor processes) or higher level (e.g., memory, attention, or language) (see Driver, Haggard, & Shallice, 2007). We assume students’ eye movements to be potentially linked to, for instance, students’ attention, their memory retrieval, or their anxiety. Taking into account all mental processes—irrespective of their level of consciousness and subject only to the condition that they show themselves in eye movements—circumvents the hardly achievable differentiation between eye movements that hint at conscious thoughts (e.g., consciously comparing two angles) and those that are likely unconscious (e.g., looking outside the task sheet when recalling a certain procedure).

2.2 Geometry problem solving

Geometry is a mathematical topic which lends itself to ET. Geometry problems typically include diagrams, which provide spatial information that—together with written/oral descriptions—determine the conditions and goals of the geometry problem (Lin & Lin, 2014a). Working on geometry problems involves both extracting information from a diagram and using prior knowledge for solving the problem. Levav-Waynberg and Leikin (2012) emphasize that “geometry combines the necessity of visual skills (…) with the requirement for abstract and logical reasoning” (p. 76). One of the open problems in the domain of geometry education concerns, as Sinclair et al. (2016) put it, “the difficulty to interpret the sensory/cognitive dichotomy” (p. 694) with respect to figural apprehension. Interpreting diagrams, many students have difficulties to distinguish relevant from non-relevant characteristics (Gal & Linchevski, 2010). This shows the need to better investigate students’ ways of thinking when solving geometry problems that include visual information: It is important “to look at students’ difficulties and follow their way of thinking” (p. 180). In our perspective, ET can provide valuable insight with respect to students’ visual perception and cognitive processes in geometry. Our study addresses the potential of ET for geometry education—which may be helpful for several open problems in geometry education, such as facing the current gap of theories about geometry thinking and learning or better understanding students’ visuospatial reasoning in its complexity (Sinclair et al., 2016), or investigating and assessing young learners’ geometry learning (who may also struggle with language acquisition, which makes assessment without ET challenging) (Dindyal, 2015).

Problem solving in geometry is not a straightforward process: It involves backtracks, skips ahead, repetitions, or start-overs when participants read text, search the diagram for patterns, construct elements in the diagram, retrieve prior knowledge from memory, and make inferences (Epelboim & Suppes, 2001). The jumpy and partially recursive nature of problem solving in geometry can be observed in ET studies, which even allow to observe small steps that do not reach awareness and thus cannot be reported by the solver (see Epelboim & Suppes, 2001). ET research so far has mainly aimed at distinguishing successful and unsuccessful geometry problem solvers through measures such as fixation duration, saccade length, dwell time, or fixation count (Lin & Lin, 2014a; Muldner & Burleston, 2015); or at investigating whether students’ geometry problem solving was more diagram- or text-directed (Lee & Wu, 2018). The geometry problem process itself has, however, not been subject to ET investigation yet. This research gap is the starting point for our study.

3 Method

3.1 Setting the scene

We studied eye movements in a research project aiming at enhancing mathematical creativity among mathematically interested Swedish upper secondary school students. Over 1 year, the 16- to 18-year-old students attended a “math club” and worked on rich mathematical problems. Math club sessions took place every second week at Örebro University for 90 to 120 min. Within the project span, we conducted two ET studies, where the students individually worked on problems in the domain of Euclidean plane geometry. In particular, we used multiple solution tasks (MSTs), which are to be solved in multiple ways, using different representations, properties, or theorems (Leikin, 2009) and have a methodological advantage for studying geometry problem solving processes in vivo: First, students work on several solutions and, thus, more data are provided than in single-solution tasks. Second, MSTs trigger processes of switching ideas, building ideas on one another, and thinking flexibly. Thus, MSTs lend themselves to the analysis of the problem solving process in its volatility. In the MST addressed in this article, the students were asked to proof an angle size in a regular hexagon—and to find different ways to do so (Fig. 1). From previous studies, we knew that this problem is rich and offers multiple affordances (Schindler, Joklitschke, & Rott, 2018). Besides, it is solvable even without extensive background knowledge in geometry and does not require extensive information extraction from text (which we did not intend to investigate).

Fig. 1
figure 1

Hexagon problem (diagram)

This paper focuses on one particular student, 18-year-old David (pseudonym). We chose David for the SRI because he had found different solutions to the problem, which provided multiple opportunities to talk about his eye movements and their interpretation in the SRI. Second, the ET data gathered in David’s case had very good quality: His eye movements were tracked accurately and reliably over the whole time span of his work on the problem. Third, during the project span, David appeared to be strong in explaining his ideas and thoughts and very interested in mathematics. He continued working in his free time and emphasized enjoying mathematics problem solving and reading mathematics books during leisure time. During the project, it turned out that David was strong in seeing and making connections between different problems, properties, representations, and different areas of mathematics. Likewise, he was able to generalize solutions and abstract from concrete situations to more general sets of problems.

The experiment was conducted by two researchers: the first author of this paper and another researcher (referred to as “Raj” (pseudonym) in David’s utterances), both experienced in working with this particular eye tracker. Before David worked on the problem, he put on the ET glasses and underwent a calibration procedure on a notebook screen, which took approximately 3 mins. The problem was presented on paper on a solid book stand, allowing David to write and draw without constraints. For analysis reasons, he was asked to change pen color after every completed solution. He was asked to finish after 15 mins, and finally stopped after 17:30 mins.

3.2 Eye tracker and analyzed ET data

For this study, we used ET glasses, which allow students to draw and write in paper-and-pencil geometrical problems, similar to what they are used to from prior experiences. We chose the headset Pupil Pro (Kassner, Patera, & Bulling, 2014), which is easy to set up and use, and currently more affordable than many other ET devices. Pupil Pro is relatively unobtrusive (below 100 g) except for the fact that it needs to remain connected to a standard computing device (e.g., notebook). Pupil Pro captures an HD video stream of the scene. The gaze point is mapped onto the scene image with a transfer function that needs to be calibrated before the eye tracker is used. Under ideal conditions, gaze estimation accuracy is 0.6° (Kassner et al., 2014). In our experiments, the accuracy was 1.5° on average.

In our study, we use raw data: eye movements as displayed in gaze-overlaid videos. Even though analyzing eye movement data predominantly narrows down to the analysis of fixations (moments when the eye remains relatively still (approximately 200 ms up to seconds)) and saccades (quick eye movements (approximately 30–80 ms) in between fixations) (Chen, 2011; Holmqvist et al., 2011), it is beneficial to use rich raw data because all eye movements are considered without relying on a specific oculomotor event detection algorithm.

3.3 Stimulated recall interview based on gaze-overlaid video

SRI is, as pointed out by Lyle (2003), a research method “through which cognitive processes can be investigated by inviting subjects to recall, when prompted by a video sequence, their concurrent thinking during that event” (p. 861). Similarly, we wanted to study David’s mental processes based on the stimulus of the gaze-overlaid video of his own problem solving.

The key idea of SRIs is to provide reflection-aiding stimuli to overcome the drawback of introspective methods that may suffer from a lack of memory. As Stickler and Shi (2017) point out, providing a gaze-overlaid video as stimulus is—similar to “regular” SRI—supposed to aid reflection. However, gaze-overlaid videos may allow accessing deeper levels of reflection. Stickler and Shi point out that gaze-overlaid videos as stimuli make eye movements visible, which were typically not conscious for the student and not visible for the researchers to begin with. Still, those eye movements were there—and the student unconsciously “saw” them through conducting them. Gaze-overlaid videos provide a strong stimulus for SRIs in which the student can perceive their own eye movements consciously in addition to the “regular” video.

However, SRIs also have pitfalls to be considered when conducting an SRI study (Lyle, 2003). First, we aimed to reduce anxiety in the interview situation by choosing an environmental context the student was used to, through a trustful personal relation between the student and the interviewer, developed over the project span, and through a positive and interested attitude by the interviewer, who merely indicated interest in David’s descriptions rather than judging them. Second, we aimed at minimizing the time span between the student’s problem solving and the SRI. Given the constraints of our project, the SRI took place in the next regular “math club” session. Accordingly, the SRI must be called “delayed recall” (Gass & Mackey, p. 50). In delayed recalls, when the time delay exceeds three days, there is an increased risk that participants—due to incomplete memories—react to the stimulus and (re-)invent their mental processes rather than recalling them (see Lyle, 2003). However, in our study we believe that David recalled his thoughts accurately: As mentioned above, gaze-overlaid videos represent a clear and strong stimulus, which helps participants to recall their thoughts (Gass & Mackey, 2000). Second, David was strong in mathematics and also skilled in communicating his ideas: His descriptions in the SRI clearly indicated that he could easily recall his original thoughts. This impression was further supported by the fact that he predominantly used present tense describing his problem solving; and was explicit about what he did not remember, saying “I don’t really remember” or “I’m not really sure” (four times). This connects to findings indicating that people are able to remember their mental processes connected to their eye movements (e.g., Hansen, 1991). Based on these observations, we were confident that David recalled his thoughts, and proceeded analyzing the data from the SRI.

For the SRI, we used the video captured by the ET glasses, overlaid with recorded gaze data by the Pupil Player software (Kassner et al., 2014). Eye movements were displayed by a green dot and lines connecting gaze points. Prior to the SRI, the interviewer explained the aim of the study, how to identify the student’s visual focus in the video, the procedure of jointly watching, and how to pause the video. Both the interviewer and student could pause, wind back and forward in the video. We recorded the SRI (76 mins) with two cameras.

3.4 Data analysis

First, we transcribed the SRI video. The transcript comprised the interviewer’s and student’s utterances and descriptions of the according eye movements. Because of the explorative and descriptive nature of our research questions, we conducted the data analysis inductively. Based on the comprehensive transcript, we worked out categories of eye movement patterns, following qualitative content analysis (Mayring, 2014), in particular, inductive category development and summarizing. The first, paraphrasing step aimed to paraphrase content-bearing semantic elements relevant to the research questions (Table 1). These paraphrases included both eye movements and the interpretation given by David. For describing the eye movements, we re-viewed the respective scenes in the gaze-overlaid video when conducting the analysis. In this step, we also verified whether the EMH held. In the case illustrated in Table 1, for instance, we decided that the EMH did not hold, because David’s gazes on non-meaningful elements in the diagram were not semantically related to his “panic”. The next transposing step aimed to generalize the paraphrases to a pre-defined abstraction level and to a uniform stylistic level. The category development step aimed to assign categories and descriptions to the transposes. After finding all categories, we conducted a category revision step. We perused all data and categories, revised the category system, and partially re-categorized. The final subsumption step aimed to collect all instances matching our categories.

Table 1 Example of data analysis steps

4 Results

4.1 Eye-mind hypothesis

Our analysis indicated that the EMH in our study predominantly held. In many instances, David’s eye movements were related to his attention. This was especially the case when eye movements focused on the entities given in the geometry task (corners, lines, or similar). Eye movements even hinted at David’s mental processes. However, even when the EMH held, distinct eye movement patterns corresponded to different mental processes (Section 4.1.1). This indicates that the interpretation of eye movements remains ambiguous in many instances in geometry settings. In other instances, the EMH did not hold: David’s thinking as described in the SRI was not related to the semantics of the stimuli given in the geometry task—for instance in situations when he recalled information, or planned ahead; or when he was in situations of emotional arousal (Section 4.1.2). Please note that the interpretation of David’s eye movements as outlined in this section do always stem from David’s own descriptions in the SRI, unless noted differently.

4.1.1 Support of the eye-mind hypothesis

David’s fixations on aspects of the diagram typically were related to his attention, though to different mental processes. In some instances, fixations on a corner of the diagram indicated that he summed up adjacent angles (e.g., Schindler & Lilienthal, 2019b). Other fixations implied that David focused on a mistake he realized he had made before (noticeable also in Schindler & Lilienthal, 2019b). Again, his fixations indicated his attention, however the actual mental process (realizing his mistake) was only inferable from his utterance in the SRI: “After I calculated that, then I realized that my final answer down here [which he then fixated, authors’ note] was also wrong and so I must have made a mistake.” In another instance, David’s fixations on corners or angles in the diagram indicated that “I’m changing my mindset from working on this problem to working on the new one with the right angles”: He had discovered a new approach to the problem and fixated on corresponding significant areas in the diagram.

Similarly, certain eye movement patterns indicated different mental processes. The following five instances refer to an eye movement pattern where David looked back and forth between two corners, following or “drawing” a line with his eyes. David was, for instance, “double-checking” the “symmetry” of two angles (Schindler & Lilienthal, 2019c); thinking about “how can I use the fact that these two [angles, authors’ note] are equal to start determining how big they are?”; envisioning an auxiliary line in mind (Schindler & Lilienthal, 2019d); comparing the two adjacent areas of an envisioned line with peripheral vision (Schindler & Lilienthal, 2019e); or following the lines of a triangle (David stated he was “thinking if I should do something with this right triangle over here”). In the latter case, the eye movements followed David’s attention on the inscribed figure and his mental process of weighing whether to use this figure in his approach (Schindler & Lilienthal, 2019f).

In all these instances, the EMH holds: David’s attention is on the entities his eyes focus on. However, mental processes are not unambiguously inferable. This finding underpins the need of context-sensitive investigations of what mental processes may be indicated by what eye movement patterns.

4.1.2 Deviations from the eye-mind hypothesis

We observed several instances where the student’s mental processes were not aligned with visual attention. David, for instance, fixated on one particular point in the diagram even though he was mentally calculating something else, which was not connected to the information in the fixated area (Schindler & Lilienthal, 2019g). He appeared to be thinking about a mistake he had made beforehand, “because my calculations for this, the green solution, didn’t work out. So I was trying to find out why this was the case.” Accordingly, his eyes then fixated on his written calculation again. Also in several other instances, David’s eyes focused on particular areas while he was processing something unrelated. Fixations in these cases were frequently on non-meaningful entities, such as the blank space, in the diagram. It is subject to further investigation whether students in these cases actively focus on these areas to—consciously or not—avoid distraction from the given visual information when processing information (as David explained), or whether they rather “gaze into space” (see Section 5). Another interesting (partial) deviation from the EMH concerns David’s symmetry considerations. In one instance, his visual attention was on one half of a triangle with his eyes predominantly moving within this half (Fig. 2, left) (Schindler & Lilienthal, 2019a). However, David explained that his attention was not only on the right right-angled triangle but also on the whole triangle—he stated that he was figuring out whether he could use these triangles to approach the problem (Fig. 2, right): “I think I always have it in my head that it IS symmetrical, but I only look at one side. And see how... I guess like check a little bit, how does this correspond to the other side? But not very much.”

Fig. 2
figure 2

Symmetry consideration where the EMH applies only partially

We observed two further interesting eye movement patterns, where the EMH did not hold.

Mentally processing and looking up left

Saccades where David looked up left, often outside the task sheet (Schindler & Lilienthal, 2019h) appeared several times. In these moments, he thought about how to proceed next within his approach, about another approach to tackle the problem, or tried to remember a certain procedure, calculated mentally, or processed information otherwise. He explained that this eye movement on space without relevant information helped him to “follow the exact same train of thoughts for a couple of seconds” and avoided distraction from the information provided in the task. Even though the EMH did not hold, still this eye movement pattern allowed us to infer that David mentally processed something—even if it was not directly linked to particular visual elements of the problem.

Emotional arousal and saccades on non-meaningful entities

In several instances, accelerated eye movements went along with emotional arousal. In two instances, David described he was “panicking a little bit” because he noticed a mistake he had made before (Schindler & Lilienthal, 2019i); or he realized his calculations did not work out. His eyes wandered around “hectically” with fast saccades without fixations on meaningful entities (e.g., corners or lines). In instances of emotional arousal, we found also that ET reliability was notably reduced.Footnote 4 This was also the case when David experienced time pressure at the end of the experiment (“Now I’m really feeling I’m running out of time. (…) I started to get a little bit more stressed, I think you told me, or the man, Raj, told me to finish this solution.”), and his gaze was not tracked reliably (Schindler & Lilienthal, 2019j). A similar decrease of reliability occurred when David was excited about finding a new approach to tackle the problem. It is subject to further investigation whether reliability generally suffers from the participant’s bodily reactions (e.g., increased pupil dilation or blinking as induced by stress; see Hess & Polt, 1964).

4.2 Opportunities and challenges of analyzing gaze-overlaid videos

The SRI gave important hints concerning the opportunities of gaze-overlaid video analysis.

Discarded approaches and “hidden” processes

The SRI revealed that gaze-overlaid videos can provide information on students’ approaches that are inaccessible using simple video data. David went into several “dead ends”—mental processes that he finally discarded. In the SRI, David repeatedly described processes which never expressed themselves in gestures, drawings, or writings. This connects to Epelboim and Suppes’ (2001) assumption that ET data may provide insight into the volatile nature of problem solving. In one instance, when David focused on one triangle, his eyes wandered to another triangle, then back to the first one (Schindler & Lilienthal, 2019k). He commented, “I am wondering whether I could use this corner here [in the second triangle, authors’ note], this angle here, but then I get unsure because it involves lines that weren’t in the figure from the beginning and then I wanted to use the original lines as much as possible”. David’s gestures, drawings, or writings did not indicate this process. In some instances, David explained that he consciously ruled out ideas because of certain ambitions or claims he had; or because of certain norms he perceived. He, for instance, pointed out that he let go of an idea because “it would be a bit too similar or too easy” and because he “wanted to use the symmetry more in the solution”.

Up-to-dateness as compared to time-delay of pointing, drawing, and writing

Repeatedly in the SRI David’s eye movements preceded his gestures or drawing and writing. In one case, David’s eyes “drew” a line 20 s before his gestures indicated that he was paying attention to this line. Another 10 s later, he drew this line (Schindler & Lilienthal, 2019l). David explained this delay several times during the SRI, stating that he first had a certain thought, then had to verify the thought before finally putting it on paper. This connects to a finding by Shayan, Abrahamson, Bakker, Duijzer, and Schaaf (2017) that participants in an ET experiment looked at relevant points in a task well before they expressed the according strategy verbally. In another instance, David’s eye movements indicated that he came up with an approach to the problem without writing anything down. In the SRI, he explained that he held back this solution because he realized that a previous solution was wrong and he first wanted to correct it. After having worked on correcting the previous solution for more than 2 min, he finally wrote down the new solution quickly. He explained, “I could have written this beforehand, but I wanted to correct my previous solution before I made a new one.”

Section 4.1 addressed challenges and limitations in the analysis of gaze-overlaid videos. Especially eye movements where the EMH does not hold constitute an analytical challenge. The analysis of our case study showed, however, that even when the EMH does hold, the analysis of students’ mental processes in the domain of geometry is not unambiguous. Our results confirm that a bijective relation between students’ eye movements and their mental processes cannot be assumed in the domain of geometry. Another particular challenge in interpreting eye movements involves moments when students have an idea, think about it, but then let go of this idea, and start working on another one instead. Especially, it is difficult to determine the “cut”—the very moment when the student stops working on one idea and starts over with another one. In our study, we could identify such “cuts” through the SRI. Further research should investigate to what extent such instances can be analyzed using ET data only.

5 Discussion and outlook

This article investigates to what extent the EMH holds in the domain of geometry and aims to lift the discussion of what inferences can be validly made from ET data. We used a case study to show the need for a refinement of the use of the EMH and a refined account of what can be inferred from ET data. The case study served as proof by counterexample: that the EMH does not always hold and that—even if it does—interpretation of eye movements may still be ambiguous. Still, we found that eye movement analysis may shed light on students’ mental processes both when the EMH does and does not hold. We can confirm Andrá et al.’s (2015) claim that in many instances “we can examine how and which information students are attending to” (p. 241). However, our results also indicate that visual attention to particular locations on the task sheet can be related to different mental processes. We identified, for instance, five possible mental processes corresponding to the eye movement pattern of looking back and forth between two corners of a diagram.

We identified interpretations of particular eye movement patterns and according mental processes, which may serve as a springboard for developing a subdomain-specific eye movement interpretation theory in geometry. An initial summary is shown in Table 2.

Table 2 Eye movement patterns and their interpretation (mental processes)

As mentioned before, we do not take an enactivist, embodied perspective on mathematical reasoning in this article. Still, it is instructive to ask how the EMH and the results of this article can be understood from such a perspective. Enactivist, embodied approaches assume that not only the brain but also perceptually guided motions serve to reach our goals, and take a holistic view about the coordination of action and perception—and of the body (gestures, movements, eye movements) and brain. In this view, it may happen that eyes and what persons are thinking are not in sync, that perception and thinking are decoupled or not well coordinated, especially in breakdown situations or when acting is not intentional. In our study in geometry, we saw that eyes and thinking were, for example, not well-coordinated in situations of emotional arousal: excitement, stress, or panic; or when David was recalling or processing information, planning ahead, or mentally calculating. Here, his thinking was not only anchored in the situation at hand, but also in previous situations (prior experiences) or future ones (planning ahead). However, we see that future work is needed to better understand and theorize eye movements from an embodied perspective.

Our results suggest that a certain ambiguity will remain in the interpretation of ET data, even in a subdomain-specific interpretation. We envision a tool for subdomain-specific interpretation that exposes the remaining ambiguity using the available knowledge about possible interpretations. Such a tool may first help researchers in the analysis of students’ geometry problem solving by semi-automating the interpretation step, especially through sequencing ET data and highlighting alternative interpretations (Fig. 3). In the long run, such a program could become a tool for teachers.

Fig. 3
figure 3

Illustration of the envisaged subdomain-specific interpretation tool (eye movement pattern: looking back and forth between two corners)

Future work could seize the technical opportunities offered by ET technology further. For instance, the question of whether a student really focuses on a particular point, or rather gazes into space could be investigated through analyzing binocular eye movements: Simply put, an inward turning of the eyes hints at a focus on the task sheet, whereas parallel gaze directions indicate gazing into space. We also believe that consideration of students’ pupil dilation will be valuable for investigating students’ cognitive load (as shown, e.g., by Lin & Lin, 2014b).

Based on our findings, we have no doubt that ET offers various opportunities for geometry education. As illustrated in this article, ET may shed light on how visual stimuli given in the situation at hand and recalled information from prior experiences play together—and thus may contribute to better understanding and theorizing the sensory/cognitive dichotomy in figural apprehension (see Sinclair et al., 2016). Our study highlights some opportunities of analyzing gaze-overlaid videos in particular, which eventually facilitates insights into certain approaches and mental processes that might otherwise remain entirely undetectable. Notably, ET data appear in many instances to precede students’ writings, drawings, and gestures by a time gap of varying length. We think that this could be an area of fruitful investigation for further research, especially given the importance of gesture and drawing in geometric thinking. Researchers, educators, and eventually education software could—in on-line settings—immediately get insights into what students pay attention to (and what they process) and accordingly give supportive feedback and interact with the students based on the eye movement data. The potential of ET was also perceived by David, our 18-year-old participant. He finally summarized:

I mean it’s really exciting that you… When you see that. I mean you… Most of the things, at least most of the things you CAN see from the eye tracking. That it IS, I mean the vast majority of it IS correct. Like what you think that I think.

However, future research will have to carefully consider to what extent ET data can be interpreted even without SRI. For our methodological article, SRI served to gain insights about ET and the EMH. Even though the EMH did not hold in all instances in our study, we still see that it is a valuable approximation which can serve to make inferences from ET data. Yet, Holmqvist et al. (2011) object that “it is impossible to tell from eye-tracking data alone what people think” (p. 71). We would like to specify this and add that it is difficult to tell unambiguously from ET data what people think. This calls for a triangulation of ET with other research methods in mathematics education, such as thinking aloud or recall interviews; or with other research methods from neuroscience. Still, we see that through studies such as ours, researchers may be able—in a long-term perspective and through conducting further studies with many students—to make inferences from ET only. We believe that it may be possible to narrow down the number of interpretations (based on patterns learned from big data and inferences from different cues and temporal sequences) so that a smaller set of possible interpretations (mental processes) and the respective likelihoods can be inferred. Furthermore, in other mathematical subdomains or topics the number of interpretations may be smaller, interpretations less ambiguous, and the EMH more likely to hold (e.g., quantity recognition). We believe that discussion is needed in mathematics education dealing with the question of what inferences can be validly made from ET data on a subdomain-specific level. We hope that our article can fuel such discussion.