1 Introduction: theoretical perception in geometry

Geometry is an important domain of mathematics. Jean Dieudonné wrote: “If anybody speaks of ‘the death of Geometry’ he [sic] merely testifies to the fact that he [sic] is utterly unaware of 90% of what mathematicians are doing today” (Dieudonné, 1981). Students have various difficulties in learning geometry (Fujita & Jones, 2007; Sears & Chávez, 2014; Smith, 2010): understanding definitions and classifications, proof construction, and recognizing geometric shapes, particularly in non-prototypical positions (Hershkowitz, 1989; Presmeg, 2008). Those difficulties are related to the main peculiarity of geometry education, which lies in a constant need to relate visual and conceptual aspects of geometric materials. Initially, geometry education relied heavily on definitions and Euclidean postulates (Del Grande, 1990; Van Hiele, 1999). In response to the formalistic approach’s dominance, nowadays, educators highlight the role of visual diagrams and their exploration (Freudenthal, 1972; Sinclair et al., 2016; Van Hiele, 1999).

Yet research in geometry education still calls for a theory that would explain how students come to see mathematics in visual inscriptions (Duval, 1998; Presmeg, 1992). Traditionally, perception is understood as seeing spatial structures determined by pre-given laws of gestalt organization, and conceptual reasoning is seen as naming and discursively operating with conceptual information. To explain the connection of visual materials with mathematical concepts, Fischbein (1993) talks about figural concepts that are described as representations of an intermediate level between visuals and propositional systems. Figural concepts maintain structural resemblance with pictures, yet appear to be more general than an individual sketch. Duval (1998) introduces special types of cognitive processes—sequential and operative apprehensions—as mechanisms that supplement perceptual apprehension in visual perception and discursive apprehensions in conceptual discursive practices. Those theoretical models overcome the tension of visual perception and conceptual reasoning by introducing new types of cognitive processes.

However, Fischbein (1993) and Duval (1998) both acknowledge that their local theories within mathematics education do not comply with the grand theory (diSessa & Cobb, 2004)—cognitive science of the twentieth century—that they are using. With a great simplification, traditional cognitive science is based on linear information processing (Atkinson & Shiffrin, 1968): sensorial qualities are processed to perceptual images based on the pre-given law of structure. Conceptual knowledge is often understood as a system of amodal symbolic representations not associated with any specific external source (visual, audial, etc.) (Fodor & Pylyshyn, 1988), which some authors extend by a separate perceptual module in the cognitive system (Paivio, 1990). Fischbein (1993) directly points out that information-processing theories are insufficient for grasping the phenomenon of conceptual and visual aspects’ interlacement within geometrical reasoning. Duval acknowledges that mathematical knowledge appears to be paradoxical from a cognitive representations-based view that he develops: “From a cognitive view, the essential fact is the paradoxical character of the mathematical knowledge, which excludes any resort to mental representations as direct grasping of mathematical objects, at least in the didactical context” (Duval, 1999, p. 15). More broadly, grounding the local theories in mathematics education in the cognitivist idea of mental representations leads to epistemological problems (Thompson & Sfard, 1994): mathematical objects cannot be aligned with external entities; thus, it is not clear what such internal representations are meant to represent.

An alternative perspective on cognition is offered by radical embodied cognitive science (Chemero, 2009). From this perspective, a subject actively and continuously interacts with the environment while maintaining existence or solving tasks (Maturana & Varela, 1992; Wilson & Golonka, 2013). Critically, those approaches treat the role of perception differently from information processing theories: perceptual structures do not appear as processing the flow of information in a bottom-up manner; instead, they develop to serve action. Radical embodied approaches largely theorize perception and motor actions that are considered processes of low-level cognition (Gibson, 1986; Kelso & Schöner, 1988). Expanding those approaches to higher-level processes is still a challenging task (Hutto & Abrahamson, 2022; Sanches de Oliveira et al., 2021). Mathematics thinking and learning appear to be a perfect ground for these theoretical steps in cognitive science (Hutto, 2019). Simultaneously, coherent theorizing of perception in geometry is powerful for education: it potentially sublates the concept-visuals dichotomy and contributes to a pedagogy that would naturally embrace visual and conceptual aspects of geometrical thinking.

In this paper, we expand radical embodied approaches to higher-level phenomena by combining with the insights from the cultural-historical approach, thus joining previous efforts of cognitive scientists (Baggs & Chemero, 2020) and mathematics educators (Abrahamson & Trninic, 2015). The cultural-historical approach highlights perception as a higher psychological function organized in a cultural and systemic way (Vygotsky, 1930/1997). Following these ideas, theoretical perception is understood as an ability to recognize conceptual geometric aspects in visual figures (Radford, 2010). This phenomenon has acquired multiple names, such as educated perception (Goldstone et al., 2010) and professional vision (Goodwin, 1994), highlighting a cultural ability to directly perceive culturally relevant, e.g., mathematical, structures in the environment. Terminologically, we use the word “shape” for a geometric concept or structure that is recognized and the word “figure” for a given picture within a visual environment. Likewise, we will use the terms “shape recognition” and “figure perception.”

How theoretical perception of shapes is grounded in active sensory-motor operations within visual environment is still unclear. We use two sources for clarifying this connection between theoretical perception and sensory-motor interaction with the environment. First, we bring in the physiology of activity—an approach developed by Nikolai Bernstein (1947/1967). Physiology of activity was initially developed in a tight interaction with cultural-historical approach, as Bernstein interacted with Vygotsky (Feigenberg, 2014), and his discoveries on movement construction grounded the research of many authors within the cultural-historical approach, such as Luria, Zaporozhets, and Gippenreyter. Later, physiology of activity became the basis of the contemporary science of movement construction (Kelso, 1982; Thelen, 2000; Turvey, 1977), which grounds radical embodied approaches (Chemero, 2009). Second, we bring the studies within the cultural-historical approach, which are largely unknown to the English-speaking research community. The theoretical research question of this paper is: What is the role of sensory-motor processes in recognizing geometric shapes, and how does it change in the cultural development of theoretical perception in geometry?

Studying perception, we explore a small part of a human sensory-motor behavior, namely, the construction of eye movements. Indeed, the phenomenon of theoretical perception appears to be vivid at the level of sensory-motor processes, as experts, unlike novices, immediately discern relevant information and perform specific eye movements (Gegenfurtner et al., 2011; Jarodzka et al., 2010; Krichevets et al., 2014; Ooms et al., 2012; Wood et al., 2013). Particularly, in geometry problem solving, experts focus on the white space of possible auxiliary constructions (Epelboim & Suppes, 2001). In this paper, we compare adults’ recognition of geometric shapes (Shvarts et al., 2019) with that of primary school children and explore the sensory-motor processes in the formation of theoretical perception in geometry by analyzing diverse students’ strategies.

Remarkably, the theoretical model developed in this paper offers an alternative to the eye-mind hypothesis that assumes that a subject processes visual information as long as an eye is fixated on it (Just & Carpenter, 1980). This hypothesis has found no strict empirical confirmation either in studies of reading and natural scene perception (Rayner, 1998) or in mathematics education studies (Schindler & Lilienthal, 2019), yet it is extensively used in the educational field as it lacks alternative theoretical interpretations of eye movements. As we elaborate further, from our theoretical perspective, eye movements reflect the need to compare sensory feedback from the environment with anticipated sensations relevant to a cultural activity at hand.

2 Perception and eye movements from cultural-historical and physiology of activity perspectives

2.1 Sensory-motor processes as a physiological basis of perception

The structure of the human eye does not seem to be very efficient. Only a small area of the retina (fovea) has a good resolution, and most color receptors are concentrated in this area. Therefore, humans can see clearly and in color only about 5 degrees of the visible field at any given moment (Kolb et al., 1995). Humans also have a physiological blind spot on the retina, where there are no receptors at all. These organ disadvantages are compensated firstly by eye and head movements and secondly by brain mechanisms.

Vision—as a function not only of the eye, but of the entire organism—is fulfilled by acts of movement of the eyes, head, and whole body. Consequently, we suggest theorizing eye movements in a line with the general processes of movement construction. We consider the physiology of activity by Nikolai Bernstein as the basis for our analysis. From the physiology of activity perspective, any human movement is constructed to solve a motor problem. Bernstein’s theory (1947/1967) states that a living system is active towards its environment as it continuously anticipates sensorial inputs in the form of a “model of the desired future” (or a forward model). These models are at the core of implementing goal-determination of human behavior. A human constantly predicts the future surrounding, plans action, and then carries it out. Continuous sensory feedback on how the action is taking place in relation to the prediction and its correction are necessary for effective behavior in a constantly changing environment. Thus, an action is based on initial planning and continuous adjustments according to the discrepancy between the forward model and sensory feedback to the motor action that form a so-called reflex loop.

Bernstein conducted his experimental studies on skeletal musculature movements, but considered a possibility of extending his theory to eye movements:

…the whole act of vision is active from the very beginning to the very end: we search with our eyes for an interesting object and track it by placing its image into the most sensitive and sharp area of the retina; we assess distance to an object based on strain in the eye muscles; we scan the object, “feel” it with our gaze… (Bernstein, 1991/1996, p. 32)

Based on Bernstein’s physiology of activity on the one hand and the activity theory by Leontiev (2009) on the other hand, Gippenreyter and her colleagues conducted a series of eye movement studies on solving motor, mental, and perceptual problems (Gippenreyter, 1978). According to these studies, the main factor determining eye movement characteristics is the goal within the problem under consideration and its place in the structure of the current activity. A visual system solves many problems that often coexist at the same time: (1) tracing a moving object; (2) focusing on objects at different distances; (3) participating in higher-level tasks, such as reading, visually exploring an object, and estimating length; and (4) regulating all other human actions, such as manipulating tools. The eye, as a part of a perceptual system, is controlled by a system of goals that a subject pursues in their activity.

A function of the visual system is to provide sensorial feedback on motor actions and to construct a forward model for further comparison with the feedback. Hereafter, we will use the term anticipatory image to address the notion of a forward model in the case of a visual system. This image is an anticipation of sensory afferentation that would emerge once a motor action enabling better vision (such as movement of the eyes, head, and body) is conducted. Unlike the notion of mental representation, the concept of anticipatory image does not theoretically point to the reconstruction (representation) of reality as precisely as possible, but stresses the anticipation of environment in an adequate-for-the-task-at-hand way.

To investigate the specific role of eye movements in constructing such an image, researchers conduct experiments with a stabilized projection of visual information on the retina (see for example Cole, 2007). Researchers experimentally prohibit movements of eyes and access the limitations of perception in such condition, thus explicating the role of eye movements. Zaporozhets (1967) showed that a lack of eye movements generates numerous possible images, which more or less correspond to the stimulation. “An autonomously operating visual system provides an observer with various images, most of which are not adequate to reality” (Zaporozhets et al., 1967, p. 305, translated by AS). For example, a system of concentric circles stabilized on the retina may be perceived as a tunnel or a toy pyramid. To select an adequate image from several possible interpretations, one has to consider congruency with environment by looking from slightly different positions by moving the eyes. It is through eye movements that a person can compare an anticipatory image with sensations coming from environment: “The visual system does not only put an image in the center by motor action but also sets an eye in such a position that it ensures maximum correspondence between the optical and phenomenal fields” (p. 306, translated by AS).

Overall, in the course of eye movements, the perceptual field is superimposed as efficiently as possible on the anticipated pre-activated image. Once this position is found, visual field stabilization (a fixation) takes place. Then, the next fixation point is chosen by comparing the received sensory activation on the entire retina and the anticipated image.

2.2 The development of theoretical perception in geometry and extrafoveal analyses

From the cultural-historical perspective, the process of perception is accomplished through exploratory actions that examine an object to construct its anticipatory image (Zaporozhets et al., 1967). Some experiments have shown that perceptual exploratory actions develop through mastering practical actions. For example, Zaporozhets and colleagues (1967) studied three-year-old children playing with a sorter. Before children find the correct hole for a figure, they unsuccessfully try to physically squeeze the figure into holes that do not correspond to its shape. Gradually, these unsuccessful attempts are replaced by placing the figure close to the hole and just looking at it, thus exhibiting external orienting actions. Later, practical actions disappear completely. Children develop a perceptual action for relating a figure’s shape to a hole for it:

The girl was taking a figure and, bringing it up to the grid [with the holes for figures], would repeatedly shift her gaze from the object to each hole and from one hole to another, after this, she would push the object into the corresponding hole without any hesitation (p. 194, translated by AS).

Thus, eye movements replace practical actions for solving the problem. The replacement of the practical actions by the perceptual actions of moving the eyes is not the end of the development. In experimental studies, Podolskiy (1977) showed that participants could learn to recognize complex patterns simultaneously—without any eye movements. Our experiments showed that adults could select a figure of the target geometric shape from four alternatives relying on extrafoveal analysis (Shvarts et al., 2019). Overall, mastering can be characterized as a transformation of initially practical actions to perceptual actions with eye movements and further into the extrafoveal analysis, invisible to an observer. Those stages exemplify a contraction—gradual diminishing of external components of an action—which is also observed in diminishing gestural and verbal expressions with mastering a new skill (Radford, 2021; Zaporozhets et al., 1967).

According to Zaporozhets and Bernstein, the selection of the target figure is based on its anticipatory image. How general can this image be? In the experiments by Zaporozhets and colleagues (1967), after changing the shapes of the figures and holes, some children again needed practical actions to find the correspondence between figures and holes. Other children could immediately adapt their perceptual actions to the new situation, thus demonstrating that they learned to build perceptual forward models for putting a figure in a hole for any shape without needing practical actions. Therefore, while some children could anticipate the success of practical actions based on visual analysis only for specific shapes, other children learned to build anticipatory images based on visual analysis for shapes in general.

In another study, Ruzskaya (1966) investigated the process of differentiating triangles and quadrilaterals by 3- to 7-year-old children. Researchers taught children the verbal definitions of a triangle and quadrilateral and asked the children to name each demonstrated figure. This action of naming helped only the oldest children. The younger children would identify a figure as a triangle only when the adult had named it before or if it had perceptual similarities with the initial figure. At the same time, when 5- to 6-year-old children managed to name figures correctly, it did not mean that they could report how many vertexes or sides a figure had. Rather, they reported monitoring some arbitrary features (such as “similar to a flag”), generalizing this similarity in a way that occasionally matched a conceptual class. In a later series, the children were taught to perform a practical action of tracing the geometric figure using a pointing gesture and simultaneously counting its sides. After the intervention, all children were able to recognize and name shapes correctly, as well as conceptually ground their choice. Thus, the formation of theoretical perception requires cultural strategies that emerge in a joint adult–child activity (see also Radford, 2021).

The research presented above allows us to suggest a general description of anticipatory image development: perception develops through mastering practical actions that help in identifying target objects as fitting those actions’ purposes. Later, recognition of an object is based on re-activating the sensations that would occur in cases of successful actions. A broader activity in which those practical actions are embedded is critical for forming a correct class of objects. For example, children are mostly engaged with such objects as toys and thus embed the perception of geometric figures into anticipation of those toys. The development of theoretical perception in geometry requires developing new actions that highlight geometric aspects of the figures (such as the number of vertexes). Further on, practical actions, that include counting the vertexes, might contract (reduce and disappear) and be replaced by perceptual actions (e.g., visible as eye movements) based on anticipatory images established during the practical actions. Later, eye movements also become substituted by activating a forward model (an anticipatory image) based on the anticipation of how a figure would appear if eye movements were performed.

2.3 A summary: eye movements in recognizing geometric shapes

Eye movements are part of sensory-motor processes that serve to solve a perceptual task (e.g., finding a target shape). The organization of these processes is subordinate to the task at hand as well as to a subject’s perceptual abilities. Eye movements are constructed so that an eye receives the most appropriate sensory feedback that is compared with continuously developing anticipatory images. In ontogenesis—with growing expertise—new perceptual abilities develop. In particular, children learn to recognize geometric shapes. This geometric perception develops by advancing anticipatory images of geometric shapes through continuous interaction with the environment: at first, in a form of practical actions with the figures and later through acting with the figures by eye movements. This view contrasts the linear information processing suggested by cognitive science, which assumes perception is largely determined by upcoming sensory inputs and separates it from conceptual knowledge. Instead, we suggest that conceptualization is a feature of perceptual and even sensorial processes themselves. Such a view provides a ground for fusion between concepts and figural aspects that were described already by Fischbein (1993).

3 Empirical analysis of theoretical perception in geometry

While theoretical investigation aims to resolve inconsistencies of previous models, didactical inquiry calls for a contributing to coordinated development of visual and conceptual abilities in geometry learning. Pedagogical systems developed within the cultural-historical paradigm are used in primary and secondary schools (Davydov, 1986/2008; Guseva & Solomonovich, 2017), yet their theoretically grounded extension to geometry education is still challenging for educational researchers and designers. As a very first step in studying possible didactical applications of our theory, we turn to the analysis of ontogenetic paths in the development of sensory-motor processes in geometric shape recognition. Tackling how perception is organized rather than how it should be organized in a future educational system, our study informs the prospective design of educational interventions that would take into account sensory-motor processes of students. Our empirical research question is: What are the particularities of the sensory-motor processes of 7-year-old students compared to adults when demonstrating their theoretical perception by successfully recognizing geometric shapes?

3.1 Methodology of the empirical study

While the cultural-historical approach assumes conducting a developmental experiment that would reveal the construction of theoretical perception through intervention, the physiology of activity aims to reveal sensory-motor strategies through the analysis of movement trajectories (Bernstein, 1967). Our current study is in line with the second approach as we trace sensory-motor processes using eye-tracking, and a developmental experiment is its potential future. Yet, we explore sensory-motor processes in a cultural task as we operationalize theoretical perception as the ability to recognize rectangles and squares—the shapes that all participants have encountered in the cultural practice of learning. In the Russian educational system, children study simple geometric shapes at kindergarten, and the classification of quadrilaterals and triangles, including definitions, at elementary school; at secondary school, all students follow a three-year course on plane geometry. As a result, all adults develop a theoretical perception of geometric figures as they easily recognize rectangles and squares.

However, the sensory-motor processes in this theoretical perception differ depending on the complexity of the task (Shvarts et al., 2019): eye movements intensify for recognizing figures in non-prototypical positions and figures among similar distractors. This difference in sensory-motorial complexity of the perceptual tasks for adults allows us to explore ontogenetic stages in the development of sensory-motor processes that serve theoretical perception. We may expect that sensory-motor processes contract asynchronously for different types of tasks as perception develops and children come to recognize some shapes earlier and more easily than others. Previously, we have qualitatively described the variety of strategies across adults, and here we use the average and variability within the entire adult sample as a baseline that represents adults’ theoretical perception of geometric figures. This baseline allows us to describe children’s sensory-motor processes with respect to the endpoint of shape recognition development.

The research design is a combination of an experimental investigation and a multiple case study of children.

3.1.1 Stimuli

The whole experiment included 64 trials with four quadrilaterals. All figures had 4–6° of visual angle size, and their centers were located at 12° distances from the screen center (see Fig. 1). There were three experimental factors. The target figure could be either a square or a rectangle. Distractors could be either similar to the target (a square among rhombuses or a rectangle among parallelograms) or dissimilar to the target (a square or rectangle among irregular quadrilaterals), thus determining the complexity of the target figure discrimination. Furthermore, we were interested in whether the phenomenon of easier perception of figures in standard—so-called prototypical—positions (Gal & Linchevski, 2010) could be traced at the level of sensory-motor processes. Therefore, we varied the target position: figures were prototypical (on their base) or rotated. Thus, we got a 2 × 2 × 2 experiment design with three factors: target, distractor, and rotation (see Fig. 1). The target area (A, B, C, or D) was quasi-randomly varied for each of the eight trials.

Fig. 1
figure 1

Examples of all eight stimuli types in a 2 × 2 × 2 experiment design with three factors: target, distractor, and rotation

3.1.2 Apparatus

We used SMI RED 120 Hz for eye movement tracking and the iViewX software for their recording. The stimuli were presented using Experiment Center 3.3. The monitor was 22″ with 1680 × 1050 resolution.

3.1.3 Procedure

The participants sat at approximately 60 cm distance from the monitor. Before the experiment, each participant underwent a 12-point calibration reaching an accuracy of 0.5°. Each trial started with showing the verbal name of the target shape at the center of the screen. The participants were asked to read the target name, then press the space button, and fixate their gaze at the fixation cross (see Fig. 2). After a 500 ms fixation at the center, stimuli automatically appeared. The task was to find the figure of the target shape as quickly as possible, press the space button, and name the letter of the target area (A, B, C, or D). After the space button pressing, the figures were changed to a mask, thereby preventing after-image processing.

Fig. 2
figure 2

The sequence of the screen frames in a trial

3.1.4 Participants

The children sample was seven first-graders (7 to 8 years old; 2 females, 5 males) from one class in a primary school in the center of Moscow who passed the lessons on squares and rectangles recognition within one of the standard book series (Gejdman et al., 2012). The adults sample was 22 university students (10 females, 12 males; aged 18 to 25 years). All participants had normal or corrected-to-normal vision; data of one child were removed due to calibration issues.

3.1.5 Data processing

A mixed-methods analysis was conducted. First, three authors of the paper watched children’s gaze replay video records. We were trying to understand how participants could spot the figure of the target shape and further report it as such, so we paid attention to the total number of figures visited and the number of fixations on each figure. This way, we could reveal which figures were mapped, how many times they were mapped on the foveal regions of children’s eyes, and in which cases extrafoveal perception would be sufficient for theoretical perception of geometric shapes. The qualitative observation revealed an interesting behavior: some participants would perform a few consecutive fixations on the same figure as if they remapped the figure on the retina in different positions. In addition, visiting the target figure once was often insufficient: participants would come to the same target figure repeatedly before reporting their decision as if confirming their choice.

Qualitative observations formed our final choice of quantitative metrics. Using BeGaze 3.3 software, the screen was divided into eight non-overlapping areas of interest (AOI): four large zones with the figures ZA, ZB, ZC, and ZD; and four small zones with letters in the center: Za, Zb, Zc, and Zd (see Fig. 3). Then, we considered the fixation sequence in each trial and counted several parameters: (1) the number of large zones with geometric figures fixated by a participant’s gaze—fixating figures (FixFig), (2) the number of small zones with letters fixated by a participant’s gaze—fixating letters (FixLet), (3) the number of consecutive fixations on the target figure (FixOnTarg), (4) the number of consecutive fixations on the distractor (FixOnDist), and (5) how many times the target figure was revisited—revisiting target (RevTarg). We also report the average time (time) required for solving the tasks, so that the reader can better imagine the process of solving such tasks by adults and children. However, we do not analyze the time parameter in detail: this problem-solving time is composed of the fixations counted in the previous parameters and does not add to the understanding of how eye movements support theoretical perception. All parameters were computed in Python 3.9 based on AOI fixations data exported from BeGaze 3.3.

Fig. 3
figure 3

Areas of interest that were used for the calculation of the quantitative metrics

For the adult sample, we averaged the data across eight tasks of the same type (see Fig. 1), and conducted a repeated measures ANOVA (SPSS 19.0) with three intrasubjective experimental factors (according to a K-S test, the distributions were normal; according to Mauchly’s sphericity test, the variances were equal). ANOVA revealed the involvement of eye movements depending on the task particularities by adults despite their individual differences, thus providing a sensory-motor characteristic of adults’ theoretical perception as the endpoint of development. As we aimed to analyze the variety of developmental paths rather than group tendencies, we characterized each child individually. We report Z-scores for our parameters (ZFixFig, ZFixOnTarg, etc.) per child, which were calculated across all conditions as follows: (Mchild-Madults)/SDadults. These Z-scores allow judging if a child essentially differs from the adult population for each parameter and each experimental condition, and grounds qualitative descriptions of the children.

3.2 Results

3.2.1 Geometric shape recognition by adults

We first report the results of the adult group, which we use as a baseline for a description of the first graders’ strategies. The data from those participants were previously reported (Shvarts et al., 2019); in the current paper, we supplement previous analysis with the other parameters that allow for characterizing differences between children and adults. Adults on average solved the tasks in Mtime = 1.138, SD = 0.409 s and attended foveally on average slightly more than a single figure: MFixFig = 1.25, SD = 1.05. Individual means varied from 0.38 to 2.37: they solved the tasks with one or two fixations, or even without them. This means that most adults effectively relied on their extrafoveal analysis to correctly choose the target figure and fixated on it for final confirmation. Therefore, overall, we see that theoretical perception of geometrical conceptual aspects of visual material is largely possible without moving the eyes, just through activation of the extrafoveal region and through attention. The repeated measures ANOVA analysis of the experimental factors’ influence revealed that the target shape (square or rectangle) did not significantly influence the number of fixated figures (MFixFig): both squares and rectangles could be recognized equally well by extrafoveal analysis before directly fixating on them. At the same time, both other factors—the type of distractor and rotation of the target—significantly influenced the number of fixated figures. This number was fewer in the trials with figures in prototypical positions (MFixFig = 1.13, SD = 1.01), compared with figures in rotated positions (MFixFig = 1.37, SD = 1.07): F(1,21) = 27.57, p < 0.001, partial eta square = 0.568. Also, the number of figures attended foveally was higher when distractors were similar to the target (MFixFig = 1.52, SD = 1.12) than in the case of irregular quadrilaterals used as distractors (MFixFig = 1.01, SD = 0.92): F(1,21) = 110.24, p < 0.001, partial eta square = 0.84. Such an increase of the number of fixated figures means that theoretical perception in these more complex conditions requires adults to check extra figures by a direct fixation more often.

As the extrafoveal analysis was prominent in searching for the target figure, the adults rarely looked more than once at the target figure: depending on the conditions, the mean number of fixations on the target figure (MFixOnTarg) varied from 0.77 (SD = 0.90) to 0.99 (SD = 0.91). Only rotation of the target figure—but not distractors or type of target—influenced this parameter: F(1, 21) = 13.33, p = 0.001, partial eta square = 0.39. So, adults sometimes re-mapped the target figures in rotated positions on the retina: apparently, adults’ anticipatory images were less ready for distinguishing rotated non-prototypical targets and such re-mapping could help to check if the figure matches the target shape.

Adults very rarely looked at the distractors and even more rarely made more than one fixation on those figures: the mean number of consecutive fixations on distractors (MFixOnDist) varied from 0.40 (SD = 0.62) to 0.82 (SD = 0.67), depending on the condition. Revisits of the target figures were also rare: MRevTarg = 0.38, SD = 0.52 on average.

Overall, on average, adults do not need to look at the figures to recognize squares and rectangles; thus, their theoretical perception includes eye movements only to check already pre-selected by extrafoveal analysis figure.

3.2.2 Geometric shape recognition by children

We describe 4 groups of 1–2 children whose sensory-motor processes in theoretical perception are different from the adults. On the figure panels below (Figs. 4, 5, 8, and 10), we present children mean values in comparison to adults’ means and standard deviations for our four parameters (see 3.1 data processing) across the task trials, varied within three experimental factors. The examples of all stimuli are presented in Fig. 1. On the graphs, we first present four data points with rectangles and then four data points with squares. Within these four, we first have figures in prototypical and then in rotated positions. Lastly, the data points for stimuli with dissimilar and similar distractors alternate. This order allows us to examine the influence of each of the factors: e.g., a vivid influence of the distractors’ type leads to a saw-toothed line. All children’s names are changed.

Fig. 4
figure 4

Danya’s and Kirill’s data for different stimuli among adult baseline

Danya and Kyrill (see Fig. 4) were the children who recognized the figures most similarly to the adults’ manner. Although their mean time was about two standard deviations from the adults’ time (Mtime = 1.92 for Danya and 1.79 for Kyrill), our parameters were in the standard deviation corridor for most conditions compared with the adult averages.Footnote 1 The children revisited target figures slightly more often than adults (ZRevTarg equals 1.55 for Danya and 1.94 for Kyrill). In particular, the most difficult cases of distinguishing rotated rectangles and squares from similar distractors (parallelograms and rectangles, correspondingly) required Kyrill to look at the target more than once, being more than two standard deviations higher in frequency than the average adult (RevTarg, the dashed green line in the web version of the article, see data points for rectangles and squares-rotated-similar). Interpreting from our theoretical perspective, we see that an anticipatory image for the most difficult recognition task appeared less developed in Kyrill’s perceptual system.

Fig. 5
figure 5

Viktor’s data for different stimuli among the adult baseline

Viktor (see Fig. 5) is the child who made the largest number of fixations in our sample: he deviated by more than two standard deviations from the adults’ mean in all variables in almost all conditions (ZFixFig = 2.17, ZFixOnTarg = 4.14, ZFixOnDist = 2.76, ZRevTarg = 4.04, and Mtime = 3.640). Remarkably, in difficult conditions, he checked many figures, thus checking distractors when they were similar to the target figures (Fig. 5, FixFig, see data points for Similar). In Fig. 6, we provide consecutive screenshots of his examinations in one trial (rectangle in prototype position between similar distractors type). After quickly finding a rectangle, he later looked at the distractors and returned to the rectangle, repeatedly exploring its angles.

Fig. 6
figure 6

Viktor’s unfolding eye movements in one probe with the target rectangle: a Viktor attends the rectangle for the first time before any other figures, b Viktor attends the rectangle again, and c, d Viktor explores the distractors and returns to the rectangle again and again

Victor explored the target figures in detail, with many consecutive fixations (see Fig. 7 for an example), which was particularly evident for the figures in rotated positions and for the rectangle in prototypical position when presented between dissimilar distractors of general quadrilaterals (see Fig. 5, FixOnTarg).

Fig. 7
figure 7

Viktor’s consecutive fixations on the target rectangle: the rectangle was remapped on the different positions of the retina multiple times

The other three students’ performances can be considered as intermediate between Viktor’s results and the results of Danya and Kyrill.

Elena attended more figures than an average adult for more than two standard deviations (see Fig. 8, ZFixFig = 2.41), she also used a lot of time (Mtime = 2.633). However, her pattern differed from the adults and Viktor (no sawtooth line): the conditions with similar distractors were not more difficult than conditions with irregular quadrilaterals. On the contrary, for rotated rectangles among parallelograms, she attended fewer figures although for adults it was one of the most difficult conditions. Careful analysis explains why. Elena explored the rotated rectangles and parallelograms a lot (see Fig. 9), irrespective of them being targets or distractors. These shapes appear as target figures in the condition with rectangles in rotated positions (see Fig. 9a; Fig. 8, FixOnTarg, rotated rectangle between similar distractors), but also as similar distractors for rectangles (parallelograms, see Fig. 9b, Fig. 8, FixOnDist) and similar distractors for squares in rotated positions (rectangles, see Fig. 9c; Fig. 8, FixOnDist). The number of revisits is high for Elena: she looked again and again at the target figure, often from a slightly different position, as if memorizing that an example of the class might appear in any of these ways depending on the position of eye fixation, thus expanding her anticipatory image.

Fig. 8
figure 8

Elena’s means for different stimuli among adult baseline

Fig. 9
figure 9

Elena’s eye movements while exploring rectangles and parallelograms: she remaps and revisits those figures multiple times independently from them being target figures or distractors

Pavel and Tanja represent two interesting cases: for some conditions, they appear to match the adult level of performance, while in other conditions, they still need a lot of foveal examination (see their lines on Fig. 10 being at the border of the adults’ standard deviation corridor or far outside). Yet, Pavel and Tanja intensively used foveal vision for different reasons. Both of them made a few sequential fixations on the target squares but explored the target rectangles much more intensively (FixOnTarg, Squares). Tanja in particular attended to the rectangles in rotated positions (Fig. 10, FixOnTarg, rotated rectangles; Fig. 11, right) by multiple consecutive fixations. We think that she could distinguish target figures already based on preliminary extrafoveal analysis, as the increase of fixations on target figures was not accompanied by an increase in exploring distractors. Yet she explored these target figures just in case. Tanja also had many revisits of the target figures, as if repeatedly confirming her extrafoveal guesses (see Fig. 11, left).

Fig. 10
figure 10

Tanja’s and Pavel’s means for different stimuli among adult baseline

Fig. 11
figure 11

Tanja’s eye movements while exploring (on the left) a target square (many revisits but not many remapping) and (on the right) a target rectangle (multiple consecutive remappings)

Pavel also explored rectangles more thoroughly than squares, with consequential fixations (Fig. 10, FixOnTarg). In particular, he was triggered by rotated rectangles. Unlike Tanja, Pavel intensively explored not only target figures but also distractors: he made on average 1.61 fixations on these types of stimuli (Fig. 10, FixOnDist, rotated rectangles; see Fig. 12 for an example). Therefore, Tanja attended to the target figure immediately based on the extrafoveal analysis and only further explored the rotated rectangles, apparently confirming her choice or exhibiting a curiosity beyond the task at hand. Theoretically, such activity would lead to the advancing of the anticipatory image for future use. On the contrary, Pavel apparently needed to attend to figures foveally and sometimes explore them to distinguish the target shapes. He did not yet have the anticipatory images of what a figure could look like in case of eye movements so that such an image would be compared with extrafoveal sensations.

Fig. 12
figure 12

Pavel’s eye movements during the search for a rectangle. Red circles highlight multiple consecutive fixations on the distractors

The last aspect of oculomotor behavior that we are going to analyze is attending to the letters that signified the answer (see Fig. 13). As we could observe, in 27.3% of cases, adults substituted attending to the figures by attending only to the letters despite their constant position and no need to read them repeatedly (see the first row in Fig. 13 for adults). The only explanation we could find for such behavior is what Gippenreyter (1978) called “pseudo-goal-oriented” eye movements: the eye movements were not needed to explore the figures foveally but helped in maintaining covert attention—attention to a specific region in extrafovea—to the figures. The children exhibited other behavior by observing the letters: in many cases, they looked at the letters in addition to the figures (see Fig. 13 for children and the examples on Figs. 12 and 14). Apparently, children either explored the letters themselves or additionally analyzed the figures extrafoveally after attending them foveally.

Fig. 13
figure 13

Frequency for the number of the AOI with the figures and AOI with the letters attended by children and adults. 100% represent all trials by all participants of the corresponding group. Gray highlights the similarity between adults and children. The yellow highlights that adults more often attended to the letter without attending to the figures, while children attended to the letters additionally to the figures

Danya—a child whose eye movements were the closest to the adults’ processes—attended more letters than other children, particularly for the simplest conditions (see Fig. 14). This corroborates the idea that attendance to letters is a form of perceptual action that appears later in the developmental path.

Fig. 14
figure 14

Danya’s eye movements with multiple fixations on the letters

3.3 Summary of empirical findings

Children shared general eye movement patterns with the adults along with two experimental factors: rotated target figures and figures between similar distractors required more fixations. At the same time, different children had difficulties with recognizing a particular shape in specific positions and situations, despite the fact that they were from one school group and had had the same geometry lessons on squares and rectangles. Those individual differences demonstrate various developmental paths: theoretical perception of different shapes forms spontaneously by the children without much guidance from the teacher and curriculum despite the studied definitions. Moreover, these data bring evidence that theoretical perception is content-specific: each shape needs to be explored by the children in a variety of figural forms. Different involvement of sensory-motor processes in this theoretical perception demonstrated different stages of mastering shape recognition. Some children searched for a target figure by visiting all sectors foveally (see Fig. 12). Others found the target extrafoveally but needed additional fixations to confirm their choice: they returned to the target shape several times or examined the targets carefully (see Figs. 9a and 11). Finally, some children demonstrated adult-like eye movements (Fig. 4): they found the target extrafoveally and gave their answer without fixating on the figure at all.

Children’s sensory-motor processes were less contracted: children made more fixations than adults. Interestingly, some children paid particular attention to the target figures, while others explored the distractors. From our theoretical perspective, through visiting the figures, children improve their anticipatory images of geometric shapes: those forward models help students to recognize shapes without making additional saccades at later developmental stages.

We observed different functions of eye movements during our experiment. The fixations on different parts of the figures remapped a figure on the retina. In some cases, there were fixations on the angles, which could allow assessing their magnitude, and saccades along the sides, which could help in assessing equal distances. In other cases, children were just remapping figures to get different projections on the retina. Theoretically, we interpret it as searching for the best position to compare sensory activation with pre-activation of the anticipatory image and updating the anticipation by various possible positions for the same figure. Another eye movement behavior involved repeatedly exiting and fixating again on the same figure, often on the same position: apparently, some children needed to make a comparison of sensory input with the anticipatory image multiple times before finalizing the decisions of their theoretical perception. Eye movements also helped adults in regulating covert attention as they sometimes fixated on the letters instead of the figures.

Overall, we assume that mastering the anticipatory images of squares and rectangles—namely, learning to anticipate possible appearances of a figure in case eyes are moved—is the way of developing theoretical perception in geometry. Improvement of an anticipatory image leads to contraction of the eye movements in later theoretical perception: children do not need to move their eyes anymore, as an anticipatory image allows imagining the sensory outcome of possible eye movements. In our data, we see children at the different stages of this heterogeneous process.

4 Concluding remarks

We propose a model based on the cultural-historical psychology and the physiology of activity that allows for the understanding of how students come to see mathematics in visual inscriptions, and thus perceive them theoretically. In this model, to recognize something familiar (no matter what: a person, an object, a geometric shape, or a formula) means to compare an anticipatory image with the actual sensory feedback from the environment, i.e., the activation of the retina. Anticipatory image is a forward pre-activation of neurons and retina that mark possible sensations that might arrive while solving the current task based on previous experiences. This forward model of sensory feedback is continuously advanced to resonate with sensations relevant to the task. Therefore, geometric perception is an active cyclic process of comparing the anticipatory image of possible sensations associated with the target geometric shapes with sensory feedback from visual inscriptions received while inspecting them in an adequate-for-the-task-at-hand way.

This theoretical view overcomes the traditional approach to perception as a bottom-up processing of information from sensory inputs to perceptual image (Atkinson & Shiffrin, 1968), and a split between perceptual and conceptual information (Paivio, 1990). Considering anticipatory images as a necessary part of perception may better ground cognitive ability for conceptual operating with visual inscriptions (Duval, 1999; Fischbein, 1993) and contribute to sublating the dichotomy between the visual and the conceptual in geometry by considering conceptual as an accumulation of possible visual sensations when acting in a given task.

Additionally, this theoretical view highlights the importance of eye movements in exploring geometrical material. Saccades set an eye in the best position to compare visually received sensations with an anticipatory image (Zaporozhets et al., 1967). The task of shape recognition can be seen as finding a match between anticipation and actual sensation through active exploration. Therefore, theoretically, we can distinguish two main functions of sensory-motor processes (in this case, eye movements) in the theoretical perception of geometrical material: (1) positioning the retina in an ideal way for a comparison of sensory feedback with the geometric anticipatory image and (2) advancing an anticipatory image based on the visual sensations.

Empirically, we explored the individual paths in the development of theoretical perception in geometry: the particularities of 7–8-year-old students’ sensory-motor processes were compared to adults’ processes when successfully recognizing geometric shapes. In consonance with the literature on the contraction of external actions (Radford, 2021; Ruzskaya, 1966; Zaporozhets et al., 1967), our data show that the number of fixations decreases for adults compared to children and fixations can even disappear as perception is accomplished by extrafoveal analysis. Adults and some children could recognize a figure without glancing at it even once. In the case where the resonance between an anticipatory image and sensory activation is established without placing the target figure in the fovea, eye movements are not needed. Contraction of explicit actions and development of anticipatory images might explain the imagining of possible geometrical transformations that ground Duval’s (1999) notion of operative comprehension: students who have interacted with geometrical figures a lot through moving eyes, re-positioning their heads, and making auxiliary constructions with their pens may later be able to imagine sensory feedback in these interactions and thus productively solve geometry tasks.

Examining the children's eye movements, we found various strategies and stages of mastering shape recognition. This variability matches the ideas of a radical embodied approach on the idiosyncraticity of sensory-motor processes (Abrahamson et al., 2015). From our theoretical perspective, the development of shape recognition is primarily associated with the construction and improvement of anticipatory images of geometric shapes. Images gradually become more general as students come to anticipate possible figures in diverse positions. Nonetheless, the prototypical phenomenon (Hershkowitz, 1989; Presmeg, 2008) persists even in adult perception as figures in a prototypical position require less eye movements. So, eye movements still serve shape recognition even in adulthood. For example, while foveal analysis of the figures was not needed anymore, the participants fixated on the letters at the central part of the sectors with the figures that they perceived only extrafoveally. Therefore, extrafoveal analysis, as well as foveal perception, is regulated through directing eye movements.

The empirical part of this study is a preliminary step in investigating the empirical and didactical consequences of our theoretical approach. A limitation of the current empirical study is its small sample of children: we might have not captured all steps and idiosyncratic routines in the development of shape recognition. Further research could be a longitudinal study with different experimental interventions that allows following the contraction of sensory-motor processes while children develop theoretical perception under adult guidance.

What implications for teaching geometry do our theoretical and empirical studies bring? A teacher should organize a child’s cultural practical activity with geometrical figures relevant to the conceptual perception of this geometric material. Students should learn to anticipate a variety of shape positions, thus developing maximally generalized anticipatory images. While calling for active interaction with the figures might not be new (Herbst, 2004; van Hiele, 1999), our theoretical proposal claims that conceptual relations in visual material become evident for children through specific actions. The empirical study allows only speculating on the actions helpful in supporting the emergence of conceptually relevant anticipatory images in geometry. Yet, knowing the universal difficulties that children experience with recognizing shapes of rotated figures, we expect that rotating a figure and putting it in a prototypical position might help in learning to perceive right angles. Square recognition could be supported by an origami-like folding of a square from a rectangular piece of paper when the shorter side is used to measure the other side, thus revealing their (in)equality. Such activities would lead to generalized anticipatory images that allow distinguishing conceptual information, thus forming a theoretical perception. Practicing conceptually relevant actions would lead to contraction of practical and perceptual actions, and the distinguishing of geometrical structures already extrafoveally.

Finally, our study informs eye-tracking methodology. In consonance with the previous findings (Schindler & Lilienthal, 2019), when studying eye movements, researchers must consider eye movements in the context of the task at hand and take into account the expertise of the subjects. As our data show, an absence of fixations does not necessarily mean ignoring a figure; instead, it very often indicates extrafoveal analysis. From our theoretical approach, eye movements should not be considered as fixations that serve the processing of information in the foveal region—as the eye-mind hypothesis suggests. Instead, we highlight a saccade as an act of positioning the environment on the retina so that upcoming sensations can be best compared with a relevant-to-task-at-hand anticipatory image. Then, a fixation serves this comparison and enables an extrafoveal choice of the next landing position. Overall, we consider moving eyes to be a part of problem-solving in interaction with the environment.