1 Introduction

Counting is the most frequent answer of professionals who are asked which mathematical competence is important to support in the early years (Benz 2014). This is not astonishing because verbal and object counting is a basic mathematical activity which has a long tradition in most pre-school settings.

On the one hand this importance of counting can be due to the fact that in many numerical development models counting constitutes a basic part (Fritz et al. 2013; Baroody et al. 2006). On the other hand it is also observable that children use verbal and object counting in everyday activities, and so early childhood educators can pick up these children’s activities as teachable moments (Baroody et al. 2006) or as natural learning situations (Gasteiger 2014). Object counting requires counting principles (Gelman and Gallistel 1986) and can lead to an understanding of cardinality of sets. So this strategy of determining cardinality can be considered as one important milestone in the numerical development of children.

Even if counting constitutes a milestone, a sole and extensive use of counting strategies can be a problem for acquiring conceptual knowledge of numbers as well as for solving arithmetical problems (Kullberg et al. this issue). Especially in the number range up to 100 the use of counting strategies is not an efficient strategy for solving arithmetic problems (Gray et al. 2000). Björklund et al. (2019) also point out that counting unit by unit does not necessarily promote the understanding of a conceptual structure of numbers, that is, of the part-whole relationship. In order to develop other arithmetical strategies, especially the part-whole relationship of numbers is a crucial concept (Resnick 1992; Hunting 2003), and therefore is also an important part in numerical development models (Fritz et al. 2013; Baroody et al. 2006).

In order to illustrate numbers and especially the part-whole relationship of numbers, visual presentations (e.g., sets of objects) are used in mathematics education. For recognizing or illustrating the part-whole relationship on a visual level, the sets must be structured. That means that the objects of a set must be placed in some relationship, for example an additive relationship that some items will be seen aggregated to new units, which then constitutes a subset: A set of five elements therefore not only consists of single items or units but rather of the subsets of four and one or of the subsets of three and two. An additional relationship can be made clear through construing structures: Four can be seen as a part of five as the whole. This relationship can be named as class inclusion or as understanding of an explicit nested number sequence (Steffe 1992).

The ability of perceiving and using structures in such visually noticeable illustrations of numbers (collections of concrete objects) is named visual structuring ability by Söbbeke (2005) (Sect. 2.2). If children are able to identify structures in sets, they have the chance to develop a mental image of a number that consists not only of many single items but also of parts.

This ability to change the focus of the perception of individual elements to the perception of subsets is a basis for numerical development (Hunting 2003). The result of a study by van Nes (2009) proves that there is a correlation between the mathematical development of numeral concepts and the use of strategies in structuring processes, which in turn provides a basis for arithmetic learning. A correlation between perceiving structures and arithmetic (or also general mathematical) performance was shown in numerous other studies (e.g., Lüken 2012a, b; Mulligan and Mitchelmore 2009; Mulligan et al. 2013). To support children of kindergarten age in perceiving structures in sets and using this perception for determining numbers, is therefore very important. Benz (2014) shows that children at this age can already perceive structures in visually presented quantities.

Nevertheless, perception is an invisible act. With the help of observations of eye movements, our aim in the present study was to gain insights into these processes, in order to make hypotheses about how preschool children perceive structures, and how these children use the perceived structures to determine the cardinality.

2 Theoretical and empirical background

In the following, different aspects of the theoretical background concerning the current study are presented. The role of the part-whole understanding as a prerequisite for a structural perception is explained and a developmental process for visual structuring ability is shown. A theoretical model that distinguishes between the two processes of perception and determination is presented. This model serves as the basis for the study. Selected studies on structural perception and use that have applied eye tracking are presented, as well as theoretical background on visual perception and visual attention, which form the basis for the eye tracking research method that is described later.

2.1 Importance of part-whole understanding

Resnick (1983, p. 126) describes that “[…] the interpretation of numbers as compositions of other numbers […]” is an important aspect in the development of the number concept. A basic principle of numbers is therefore their additive composition. This means that numbers consist of other numbers and that each number can be divided into parts: “[…] any quantity (the whole) can be partitioned (into the parts) […]” (Resnick 1983, p. 114). On a visual level and for the structural perception of sets, this means that a set of elements (the whole) can be divided into subsets (parts).

The development of the part-whole understanding begins already at kindergarten age between four and five years (Sophian and McCorgray 1994) and usually expands and consolidates in the first grade (Irwin 1996). Structuring processes in the sense of decomposition and composition of a visually presented set of objects form a fundamental concept for the understanding of parts and the whole, since “this composing process fosters an understanding of part-whole-relations and vice versa” (Baroody et al. 2006, p. 193). This support of the part-whole understanding through the decomposition and composition of quantities causes the focus on the perception of each individual element to be shifted to the perception and recognition of subsets, which in turn is an important process for the development of the number concept, and contributes to numerical development (Hunting 2003). Further studies point to a connection between visual structuring ability and part-whole understanding (Young-Loveridge 2002). Numerous studies have also shown that part-whole understanding is essential for the development of numeral concepts (Benz et al. 2015; Fritz et al. 2013; Baroody et al. 2006) and successful mathematical learning (Fischer 1990; Resnick 1983).

The results of Björklund et al. (2019) also support these findings. They investigated the use of finger patterns in solving a subtraction task in a study with a total of 126 children aged 4 and 5 years. They regard the use of finger patterns as a way to structure a set into subsets and thus make the part-whole relationship visible. They found that there is a strong correlation between “experiencing numbers’ part–part-whole relations and showing structured finger patterns” (p. 22).

Overall, it can be stated that the part-whole understanding can be seen as a prerequisite for children to develop visual structuring abilities.

2.2 Development of children’s visual structuring abilities

How children develop visual structuring abilities was investigated by Söbbeke (2005), Mulligan and Mitchelmore (2009) and van Nes (2009) (see also Mulligan et al. this issue). Lüken (2012b, p. 115) stated that each of these three studies describe a process using a four-stage development model as a cumulative process. The result of this process is that children at a higher level have already acquired the abilities of the lower levels. The developmental process can be described as follows: Children initially do not recognize any structures, and then begin to perceive individual structural aspects. At first they can only recognize these and later also use them, because they are increasingly able to decompose patterns and integrate substructures. Finally, they can consider several aspects simultaneously, reproduce them and use them for their own individual structuring processes. A flexible reinterpretation of representations becomes possible as well as a developing awareness of the importance of structuring for the abbreviation of numerical processes (Lüken 2012b). The latter would be the case, for example, if children used the counting strategy counting in steps to determine the cardinality.

The link between visual structuring ability and the development of strategies not relying on counting unit by unit can also be clarified by discerning different processes when identifying cardinality of quantities.

2.3 Theoretical model of two processes: perception and determination

If someone identifies the cardinality of sets, two processes can be distinguished (Steffe and Cobb 1988; Benz 2014), namely, the process of perceiving the set and the process of determining cardinality. Both the process of perceiving a set as well as the process of determining the cardinality can be distinguished further. These two processes and their possible relationship are illustrated in Fig. 1. The model was first developed using an inductive approach and then empirically evaluated (e.g., Benz 2014; Benz et al. 2015, p. 134; Schöner and Benz 2018).

Fig. 1
figure 1

Two processes: perception of sets and determination of cardinality (Schöner and Benz 2018, p. 125)

Since this model provides a theoretical basis for the object of our present research, the possible links between the two processes are described in more detail below. A particular focus is on the perception in structures and the use of structures to determine cardinality. If a set is perceived as many individual elements for determining the cardinality, only counting one by one is possible. If a set is seen as a whole there are two possibilities in determining the cardinality. In the determination process it is again possible to use the counting strategy counting all, or to apply known facts (Gray 1991, p. 554). In this last case, the two processes of perception and determination coincide and it is called perceptual subitizing (Clements 1999; Clements and Sarama 2014). Bloechle et al. (2018) could observe the same cognitive activities when children were subitizing sets with up to four randomly displayed objects as they perceived known dice patterns. Despite someone using known facts, counting strategies can still be applied (for proving).

If structural perception happens, the set is decomposed into parts or composed of parts. This kind of perceiving enables a variety of determination processes. Even if the set is perceived in structures, each single item of the set can still be counted individually to determine the cardinality. If one part of the set is known by subitizing, the rest of the set can still be determined by counting on (e.g., “Here are four—five, six, seven.”). If every part is subitized, the children can still use different counting strategies, but also non-counting strategies like nearly doubling (e.g., “Three and three equals six and one more is seven”), composing (e.g., “A full row is five and two more are seven.”) and adding-on (e.g., “That’s seven eggs, because if there were three more, it’d be ten.”). If the process ‘perceiving a set in structures’ and the process of ‘determining cardinality with the help of known facts’ coincide it is called structural subitizing in this paper (see Schöner and Benz 2018 for more details). The term is used in distinction to conceptual subitizing. The term structural subitizing covers the process of perceiving in structures and every possible determination process that results from a structural perceiving process (e.g., “There’s four and three here, and I know immediately that’s seven.”). This means, for example, that counting on is included in conceptual subitizing, but excluded in structural subitizing, because in the former although a part of the set was perceived in a structured way, the rest of the set is determined by counting every single remaining element.

Because the process of perception is an invisible process, the following list gives an overview of different possibilities of how, in various studies, insights were gained into children’s perception:

  • Verbal explanations of perceiving (Benz 2014; Lüken 2012a, b).

  • Displaying objects so that it is visible at a glance without counting how many are displayed (also with explanations) (Benz 2014; Gervasoni 2015; Lüken 2012a, b).

  • Highlighting structures in presented sets (Häsel-Weide 2016).

  • Reproduction of flashed sets (Lüken 2012a, b; Mulligan and Mitchelmore 2009).

  • Conclusions based on the results of determining cardinality when sets were presented very briefly (Benz et al. 2019).

These different approaches can lead to different difficulties and challenges. Verbal explanations, for example, depend on verbal skills of children. When children present objects in such a way that someone can easily see without counting how many are presented, and if they are not expected to explain what they have presented, conclusions about the process of perception may be drawn about children’s ideas of structuring a quantity. Still, in this case the expression seeing without counting must be understood. If children are asked to highlight structures, it is taken for granted that the children already use structures. If a presented set is required to be reproduced, memory capacity also plays a role. If sets with more than four objects are presented very briefly, the children must perceive the structure in the sets in order to know the cardinality immediately or later to reconstruct the cardinality mentally using different strategies. But this reconstruction needs also memory performance.

Due to the different limitations described above, in this study an eye-tracker was used as a research tool in order to gain insights into the perception processes of children.

2.4 Selected studies with eye tracking analysis and structural perception

Eye-tracking was used in the studies by Schindler et al. (2019) and Lindmeier and Heinze (2016) in order to make statements about strategies in structural perception and structural number determination. The two studies are presented briefly, in order to show how the tasks were designed and how data were analyzed. Some relevant results are described.

Schindler et al. (2019) interviewed 20 children, with and without arithmetic difficulties, who were on average 11 years old. In the individual interviews, each of the children wore wearable eye-tracking glasses and sat in front of a monitor showing set images of a 100-bead abacus and a 100-dot-field. They were asked to determine their number as quickly and correctly as possible. Later, the number of fixations and the children’s eye movements were analyzed by developing inductive categories. As a result, they found that the children’s gaze patterns could be used to identify strategies that were previously unknown, such as subitizing the biggest unit of 20 or 30 or counting fives. The authors saw an advantage in the choice of eye tracking technology as a research method because an additional verbalization step could be avoided. A contradictory statement can be found in another publication of the same year. Schindler and Lilienthal (2019) suggested here that a data triangulation between eye-tracking and other research methods, such as think aloud or post interviews, are important, since the eye-tracking data alone do not allow researchers to draw any conclusions about thought processes.

A smaller number range was investigated by Lindmeier and Heinze (2016). They conducted a study of nine first-graders and 11 adults. The tasks were presented in different representations on a monitor. The number range up to 10 was investigated with a ten field and finger pictures. The number range up to 20 was investigated with a twenty field and a picture of an abacus with twenty pearls. The children or adults had to confirm the known number by pressing a key immediately. During the interview the eye movements and the processing times were recorded. The authors described a coding scheme that was developed on the basis of the scanpaths which were assigned to different strategies. Four groups of strategies were generated: Counting strategies, subitizing up to three, structure-based strategies combined with counting strategies, and purely structure-based strategies. As an example of purely structure-based strategies the authors describe a pendulum motion of the fixations between two subsets. The results of the study showed that adults had more complex strategies in solving the problems and that they were faster than first-graders. They stated further, that different strategies could be recorded between different but structurally identical tools. The finger pictures and the ten-field, for example, have the same basic structure of two times five fingers or fields. But less complex strategies could be observed in the finger patterns than in the ten-field.

The youngest participants in the two studies described were first-graders. Despite intensive research, no study with kindergarten children with a similar focus could be found. In addition, the sample sizes of the two studies were very small with 20 persons each. Such small sample sizes are often found in eye-tracking studies due to the effort and expense involved, and accordingly the results are usually not generalizable (see also the eye-tracking study by Möller et al. (2009) with 10 children between 10 and 11 years).

2.5 Visual perception and selective visual attention

For the investigation, it is essential to consider selective visual attention in order to observe and describe perception. The following quote from William James (1890) still serves today as a basis for selective attention:

Everyone knows what attention is. It is the taking possession of the mind, in clear and vivid form, of one out of several possible objects or trains of thought. Focalization, concentration of consciousness are of its essence. It implies withdrawal from some things in order to deal effectively with others […]. (pp. 403–404)

With the help of eye-tracking, eye movements can be analyzed in order “to gain insight into the viewer’s attentive behavior” (Duchowski 2017, p. 111). The orientation of attention can be divided into two different, complementary mechanisms: exogenous orientation and endogenous orientation. Characteristic of exogenous orientation is the automatic mode of operation activated by external stimuli, such as a short peripheral flash of light. Endogenous orientation is characterized by a controlled mode of operation that is activated by internal processes and can be consciously controlled (Posner 1980; Müller and Rabbitt 1989). An orientation of attention to a certain location can be either overt or covert. With the overt orientation eye movements can be observed; with a covert orientation they cannot (Posner 1980). It has been well investigated that in the latter case, that is, without eye movements, information can be perceived in the peripheral visual field where the objects have not been fixed directly (e.g., Posner 1980).

3 Research questions

There are studies with children of kindergarten age in which structuring processes are investigated (e.g., Lüken 2012a, b; Mulligan and Mitchelmore 2009). In order to gain insights into the invisible process of perception, various approaches have been used, such as including the explanations of the children (Sect. 2.3). Eye tracking can help to gain insights into this process (Sect. 2.4).

This consideration leads to the following two research questions regarding the visual structuring possibilities of 5-year-old children:

  1. 1.

    How do preschool children perceive structures?

  2. 2.

    How do children use the perceived structures to determine the cardinality of a set of objects?

The questions of which challenges and opportunities can be found in the eye-tracking research method, and which helpful aspects result therefrom for a hypothesis-generating analyzing process, are not answered in this paper. These aspects were discussed in detail in another contribution (Schöner and Benz 2018).

4 Design of the study

In the present study 95 children were interviewed individually. The children attended the last kindergarten year in nine different German kindergartens and had an average age of 5 years and 4 months. The interview took place at the beginning of the kindergarten year and consisted of several parts. In this paper the focus is on the part in which we used photos of egg cartons for ten eggs. This is the usual size of an egg carton, which the children normally know from their everyday life. On a monitor the children were shown a total of 11 photos with different numbers of eggs in an egg carton. An eye-tracking camera was used to record the eye movements of the children. In addition, there were two other cameras to record gestures of the children (e.g., lip movements or pointing with the finger). One was a webcam mounted at the top of the monitor that filmed the child from the front; the other was positioned diagonally behind the child. These gestures were considered when evaluating the processes of perception and determination. The parents of the children were informed about the interview procedure as well as the aims of the study and gave their approval for the children to be recorded by video and for the data to be analyzed anonymously.

4.1 Task

In the presented analysis, six items with the cardinality of five, seven and nine were used (Fig. 2). There were different representations of the numbers five and seven. The given structure of the egg cartons could have an influence on the perception of the structure, for example in seeing the predominant pairs of two or the dice patterns of the four or six. Otherwise, seeing rows could also influence the perception.

Fig. 2
figure 2

Items that were analyzed

For a better understanding of which representation is implicated, a clearly defined label is used. The abbreviation u3,b2 means that there are three eggs in the upper row and two eggs in the bottom row. These abbreviations are for communication purposes only and were not visible to the children.

Before the interview, the children were told that the interviewer wanted to know how many eggs were in the photos and that they should give the number as soon as they knew it. The children were given no time restrictions. Each child was shown photos of egg cartons on a monitor. The procedure was the same for each item: First, a closed egg carton was shown, then an open carton appeared, the child said the number, the interviewer asked how the child came to the result, the child explained, and then again a closed carton was visible.

4.2 Aspects of data analysis

The analysis are based on a qualitative, hypothesis-generating method. The theoretical model of perceiving a set and determining the cardinality described above formed the basis for the analysis. In total, the analyses were based on three different types of data, from which final interpretations about the perception process and the determination process were generated (Fig. 3).

Fig. 3
figure 3

Analysis scheme (see also Sprenger and Benz, in press)

The first two data types result from video recording 1 and eye-tracking from phase 1 (Fig. 3). Both are collected as the child looks at the open egg carton on the monitor, and end as soon as the child has named the number. Video recording 1 included observed gestures, sounds or the promptness of the answer. The eye-tracking data included the eye movements of the children and provided insights into the children’s processes of perception. After the child said the number there was also data from video recording 2, collected from the children’s explanations, which happened in phase 2 (Fig. 3). The final interpretations about the perception and determination process were gained from all three different types of data (Fig. 3). The three data types resulting from video recording 1, eye-tracking, and video recording 2 are interrelated and complement each other. This interrelationship indicates that the three-stage analyzing process is very complex (Sprenger and Benz, in press).

4.3 Eye-tracking data analysis and eye-mind hypothesis

In the current study we refer to endogenous attention with overt eye movements. A set of objects was presented to the children and their attention was directed by the task “How many are there?” for the purpose of directing their perception to the set and the determination of the cardinality. To be able to analyze overt eye movements, “a method is needed to identify fixations—those eye movements which best indicate the locations of the viewer’s (overt) visual attention” (Duchowski 2017, p. 111).

The analysis of data generated with eye tracking was based on the eye-mind hypothesis, which goes back to Just and Carpenter (1980). This hypothesis suggests that “eye-movement recordings can provide a dynamic trace of where a person’s attention [emphasis added] is being directed in relation to a visual display” (Poole and Ball 2006, p. 212). This dynamic trace can be described using fixations and saccades. In the case of fixation, the eye remains in one place for a period of time that ranges from some tenths of a millisecond to several seconds (Holmqvist et al. 2015, p. 21). A saccade is a quick eye movement that occurs between the fixations. Usually the eyes are moved to the next viewing position. (An exception is the regressive saccade, meaning a backtracking eye-movement, which is not discussed here.) Visual processing does not take place during saccades in order to avoid blurring of the visual image (Poole and Ball 2006). In summary, it can be said that with the help of eye-tracking it is possible to track where a person has looked, how long each fixation has been and how their eyes are moving from one place to another.

Referring to the eye-mind hypothesis, we assumed that children’s attention is where they look. However, we want to point out that it is very important to note that this eye movement data needs to be interpreted. During the analyzing process in the present study and the circular analysis procedure, it was observed that there may be contradictions between the data collected from other data sources and the eye-tracking data. For example, a child explained that he or she had counted the objects, but the eye movements showed a pendulum motion indicating a structural perception of a set (Schöner and Benz 2018).

The eye-tracking data in the current study consisted of the children’s eye movements during phase 1 of the interview (Fig. 3). The fixations were recorded with an infrared camera in order to follow the children’s eye movements. In our analysis we refer to the assumption (as already mentioned above) that these recorded eye movements were related to the person’s attention. There are different possibilities in analyzing the collected eye-tracking data. Some qualitative options are visual representations of gaze paths such as GazePlot, HeatMap or Cluster.

The GazePlot data representation uses dots to show the order of fixations in which the children look at the open egg box. This is also called the scanpath. Each dot describes a fixation and the number that appears on each dot indicates the order of the fixations. The larger the diameter of the dot, the longer the children looked at the particular spot. It is helpful to know that only the center of the circle indicates the exact fixation point, because the underlying scale can be changed as required. In order to analyze the eye-tracking data, the course of the children’s gaze was viewed several times using the GazePlot video. By this means, and based on collected data from video recordings 1 and 2, it was possible to discover certain gaze patterns as described later in this paper.

The HeatMap data uses a color gradient to visualize how long and how often a particular spot has been viewed. The darker the color, the longer and/or more often this place was fixed; the lighter the color, the less often it was fixed. For the analysis in the present study Accumulate-HeatMaps were used to illustrate gaze patterns. The result is that all color gradients of the selected people were displayed on one graphic (Fig. 5, left).

The Area of interest (AOI) is a helpful tool for doing statistical calculations that allow researchers to calculate quantitative eye movement measures. These include fixation counts and durations. With this tool it is possible to draw a border around an element of the eye-tracking stimulus. The software then calculates the desired metrics within the boundary over the time interval of interest. Figure 4 shows an example of such AOIs, referred to when presenting the results in this paper.

Fig. 4
figure 4

Example of ten AOIs of an egg carton

A rectangular AOI was placed around each place in the egg carton. Experience from the detailed evaluations shows that children do not always look exactly in the middle of an egg when they count it. For this reason, the AOIs were not placed in a circle around the eggs, but rather in this way in order to extend the area per field slightly (Fig. 4). The individual AOIs were named Field 1, Field 2 … Field 10.

A (pendulum-) motion between two such AOIs can be observed with a scanpath. For some children, an eye movement from left to right (AOI Field 2–AOI Field 3) or vice versa was visible, for others a real pendulum-motion was observed, for example from the AOI Field 2 to the AOI Field 3 and back again to the AOI Field 2. In order to investigate this (pendulum-) motion-phenomenon qualitatively and quantitatively, the group of children for whom eye-tracking data was available, was considered in more detail. At first, the children were considered in whom such a (pendulum-) motion was observable and who also explained that they saw “four and one”. In order to get a visual impression, all gaze data of these children were presented in an Accumulated HeatMap-Graphic (Fig. 5, left). This means that these gaze data were superimposed. It can be seen on this HeatMap-Graphic that most eye movements are on the AOI Field 2 and on the left half of the AOI Field 3. However, it is also visible that the eye movements go beyond these two AOIs.

Fig. 5
figure 5

Accumulated HeatMap (left); AOI Field 2 extended and AOI Field 3 extended (right)

In order to analyze the (pendulum-) motion statistically, the AOIs were modified to the HeatMap-Graphic. Figure 5 (right) shows the extended AOIs with the new names Field 2 extended and Field 3 extended. This selection of AOIs is the result of a circular process between inductive and deductive procedures. It results in the largest intersection in the recording of all (pendulum-) motions of the children in whom such a (pendulum-) motion was observable, and who also explained that they saw “four and one”. During the qualitative analysis of the data it became clear that there are also pendulum-motions within the AOI Field 2 extended, whose fixation points for the third egg of the first row move only ‘in the direction’ of this egg, but do not reach the AOI Field 3 extended. An attempt was made to set a second condition if the fixations were displayed exclusively in AOI Field 2 extended, in which this AOI was subdivided even more finely. However, it was not possible to include all cases. Thus that there are also pendulum-motions which are not included, and on the other hand that there are also a few cases that are wrongly included in the group of structural perception. One of these few cases concerns the perception of a set as individual elements. Theoretically, it is possible for a child to count each egg individually and the fixation point on the first two eggs of both rows is located so that it is still within the AOI Field 2 extended (Fig. 5, right). Both of these imponderable cases that have been described are accepted in favor of the largest possible common intersection of the children with a (pendulum-) motion. This example clearly shows that the analysis of eye movements is an interpretation process and will always remain interpretative to a certain extent.

In order to describe the pendulum motion by means of the fixations, a software feature of the eye-tracking program is helpful: An example of a child for the item u3,b2 with the AOIs Field 2 extended and Field 3 extended (Fig. 5) is described in Table 1. The only basis here is the pendulum movement described above between subsets four and one.

Table 1 Example of analysis of data from one child, for the output of fixation data using AOIs

The column Fixation Index shows the order of the fixations. In the example (Table 1) there were a total of ten fixations. The first fixation is special because it is often still from the closed egg carton shown before, and the eyes have to adjust to the new situation (the opened carton) first. In order to make sure that this first fixation does not falsify the interpretations, for the aforementioned reason it is not analyzed. The value 1 in the two columns of the AOIs means that a fixation has been recorded in the corresponding AOI, the value 0 means that no fixation has been recorded. Although the representation in Table 1 is quantitative, the pendulum-motion of the fixations (indicated in bold) can also be clearly seen visually (Field 2 extended–Field 3 extended–Field 2 extended–Field 3 extended–Field 2 extended).

In the following results, the focus of the final interpretations about the two processes of perception and determination was tested by determining whether a structure was perceived or used to determine the cardinality or not. For the respective missing percentages, for example, no interpretation was possible, or no clear hypothesis could be generated concerning the perception in structures or on the use of structures. Conclusions were drawn based only on the number of possible interpretations of the two processes.

5 Results

First, all final interpretations were considered, independently of structure perception and structure usage. Table 2 shows the percentage frequencies of all final interpretations that could be generated for the two processes of perception and determination. The values refer to data from all interviewed children (n = 95). Despite the eye-tracking data, more final interpretations could be drawn concerning the determination process (between 91 and 97%) than about the perception process (between 72 and 84%).

Table 2 Percentage frequencies of all possible final interpretations of the two processes

Figure 6 illustrates the percentage frequencies of the final interpretations on the perceptual process. Only the final interpretations on perceiving a set in structures (see Fig. 1) and no structural perception (which includes perceiving a set as individual elements and perceiving a set as a whole, see Fig. 1) are taken into account. The range of perception in structures lies between 25 and 66%, the range of final interpretations in which no structure was perceived lies between 12 and 37%. In comparison to the other items, u5,b0 is the only one that is not dominated by structural perception.

Fig. 6
figure 6

Percentage frequencies of final interpretations in the perception process

Also noticeable is the high percentage frequency of structure perception in the item u3,b2. 66% of the children perceived a structure in this representation.

In the GazePlot visualizations of eye movements, a (pendulum-) motion between AOI Field 2 and AOI Field 3 (Fig. 4) could often be observed in this case. For 31 of the 95 children, either no eye tracking data were available, there was only one fixation, or they counted each egg individually. Therefore, in order to investigate the described (pendulum-) motion-phenomenon qualitatively and quantitatively, only the group of the remaining 64 children was considered. (See Sect. 4.3 for the corresponding qualitative method and derivation). The 64 children could be assigned to four categories to investigate this (pendulum-) motion: (1) No (pendulum-) motion and “four and one” not reported (2) No (pendulum-) motion and “four and one” reported (3) (pendulum-) motion and “four and one” not reported, (4) (pendulum-) motion and “four and one” reported (Table 3).

Table 3 Contingency table with frequencies of the four categories

The contingency table (Table 3) shows the frequencies of the four categories. Each child could be clearly assigned to a category, the individual observations were independent of each other and the expected frequencies of the different categories were greater than 5. Thus, all prerequisites were present in order to use a Chi Square-independence test according to Pearson (Field 2018).

The test shows that there was a significant association between the explanation “four and one” and the four-one-motion of the eye-movements χ2 (1) = 8.22, p < 0.004, n = 64. Based on the odds ratio, the odds of the four-one-motion of the eye-movements were 8 times higher if the children reported “four and one” than if they did not report “four and one”.

There were also two children who explained that they saw “three and two” (upper row and bottom row) and a (pendulum-) motion of the gaze data could be observed. It moved between AOI Field 2 and AOI Field 7, that is, between the two rows. Still, not enough data are available in this case for the correlation to be tested statistically. A pendulum motion as an indication of structural perception was also observed by Lindmeier and Heinze (2016) (Sect. 2.5).

Further results relate to the determination process. The focus of the study was not on whether the children named a correct or a wrong number, but on whether they perceived a structure and used it. The primary aim of number determination is to have a result. For this reason, Table 4 gives an overview of the solution frequencies. With regard to individual items, there are interesting relations. Altogether, between 12 and 40% of the children in all six items name a wrong number. 60–88% report the correct result.

Table 4 Percentage frequencies of all correct or incorrect naming of the number (for all final interpretations)

For the item u3, b2 the highest solution frequency could be found. 88% of the children gave the correct result. It is also noticeable that the three pictures with the cardinality five are the ones with the highest solution frequencies compared to the other three pictures.

Figure 7 shows the percentage frequencies of a structural use (light green bars) or no structural use (dark green bars) to determine the cardinality. Only the final interpretations on structural use and no structural use are taken into account. Strategies of structural use are, for example, counting strategies such as counting on or counting in steps, non-counting strategies such as nearly doubling, (de-)composing, adding-on or structural subitizing (Sect. 2.3). The item u3,b2 also stands out in the determination process. 55% of the children used the structure to determine the number. For all other items, the non-structural strategies predominate. In particular with the item u5,b0, the counting strategy ‘counting all’ is predominantly used. This was the only item in the perceptual process where more children perceived no structure (Fig. 6).

Fig. 7
figure 7

Percentage frequencies of final interpretations in the determination process

When comparing the perception in structures and the number determination in which a structure was used, it can be seen that the structure plays a greater role in the perception process in all six items than in the determination process (Fig. 8). Thus, more final interpretations can be made about the process of perception.

Fig. 8
figure 8

Percentage frequencies of final interpretations in both processes concerning structure

A strategy that uses structure presupposes a perception of structure, but conversely a perception of structure does not necessarily result in a strategy of determination that uses structure. This assumption becomes visible in the diagram, since the green column (determination process) is always a subset of the blue one (perception process).

6 Interpretation

General statements that can be made on the basis of all final interpretations are that fewer interpretations could be made overall about the process of perception than about the process of determination (Table 2). One possible explanation could be that the children repeatedly used their fingers to count and thus their hand covered the eye-tracking camera. Accordingly, in these cases no data could be obtained from the eye movements. It is to be expected that this behavior of the children will change as they get older and are less dependent on counting to determine the cardinality. Also, it can be stated that clearly more numbers are mentioned correctly than incorrectly (Table 4). In addition, the highest solution frequency could be found for all three representations with five eggs.

The research questions addressed in this paper were, how preschool children perceive structures and how they use them to determine cardinality. It can be noted that the structure plays a more important role in perception than in determination (Fig. 8). One possible interpretation is that the children already perceive the structure but cannot (yet) use it to determine the cardinality. The item u3,b2 seems to play a special role. 66% of the children perceived a structure in this representation (Fig. 6). Possible interpretations are that small subsets can be built in this item that can be perceived simultaneously (for example “two and three”) or that the dice pattern of the four plays a role. Many of the children explained that they saw “four and one”, which in turn would support this interpretation. This could also be a possible reason why the children most frequently determined the correct number (88%) (Table 4). The item u3,b2 is the only item in which the structural strategies predominate in the determination process (Fig. 7). The substructures “four and one” as well as “three and two” are primarily used for perception and determination. In further analyses it must be examined whether the dice pattern of the four is also perceived for the item u5,b2 and used to determine the cardinality.

There is a significant correlation between the explanation “four and one” and the (pendulum-) motion “four and one” observed in the eye movements (Tables 1, 3 and Fig. 5). One assumption is that only in this item two small groups are perceived (“four and one” or also “three and two”). Possibly the dice pattern of the four is dominating. With the item u5,b0 a perception in structures is least observable (Fig. 6) and the children rarely used structure-using strategies to determine the cardinality (Fig. 7). This observation allows a variety of interpretations. One explanation could be that a row of five eggs offers fewer possibilities for building subsets and the children therefore count the eggs. Another interpretation could be, that the fact that there are five eggs in a full row is not yet known.

In summary, it can be stated that children at this age are able to perceive structures and also use them to determine the cardinality. The assumption that the children primarily perceive and use small visual quantities that are familiar to them seems obvious. An indication of this is the item u3,b2, which is primarily divided into “four and one” and “three and two”, while other items, such as u5,b0, are more likely to lead children into counting all. Probably the children have not (yet) acquired the knowledge that there are five eggs in a full row of a box of ten eggs and therefore cannot use it to determine the cardinality.

7 Discussion and conclusion

As far as we know there is no comparable study on this issue that deals with eye-tracking and 5-year-old children. Therefore, it is a particular challenge to analyze the eye-tracking data so that they can be an adequate help in describing possible perception patterns. For example, the classification of the AOI as a basis for the statistical hypothesis testing of the pendulum motion was created by a qualitative, circular process between inductive and deductive procedures (Sect. 4.3): The interpretation of the eye-tracking data was possible only in combination with the other two data sources resulting from video recording 1 and video recording 2 of the children. The resulting gaze patterns could then be applied to further analyses of the eye-tracking data. By means of a HeatMap it is possible to gain a visual overview of the gaze areas that were particularly intensively considered. These visualizations (Fig. 5, left) can serve as a basis for further analysis. In the present case, AOIs were found on the basis of the HeatMap graphics (Fig. 5, right), which in turn served as the basis for statistical calculations (Tables 1 and 3). It should be noted that these statistical calculations are based on interpreted scanpaths. Therefore, a sufficiently large sample size is required.

A challenge in the analysis of the eye-tracking data is also the interpretative aspect of the gaze data. Although the child’s attention is at the point where the eyes are fixed (see eye-mind hypothesis, Just and Carpenter 1980), it is known from research that information that is not directly fixed can also be registered (e.g., Posner 1980). This covert orientation (Sect. 2.5) is often formulated as a critical point of the eye-mind hypothesis, since in this case information can also be received in the peripheral visual field. Another aspect that is criticized is that the eye-mind hypothesis does not apply if the person is in situations of emotional excitement, such as stress or panic, if, for example, he or she does not know the solution due to tasks that are too difficult (Schindler and Lilienthal 2019, p. 134). These points, in turn, suggest that final interpretations about the perception process should not rely exclusively on eye-tracking data. This is the reason why a hypothesis-generating procedure with three different types of data was used in the present study. In contrast to Schindler et al. (2019), the verbal explanations of the children should be mentioned here in particular, who in addition to the data of video recording 1 play an important role in the analysis of the eye-tracking data and thus in building final interpretations for the perception process. However, if the sample size is very small, as in the studies described in Sect. 2.5, only very cautious hypotheses can be made.

A characteristic and strength of the study is the data triangulation as well as the differentiation of the two processes of perceiving a set and determining the cardinality. Eye-tracking supported by the other two data types provides insights into the structural perception process. These insights can in turn serve as the basis for early mathematical education:

Implications for early mathematical education can be derived from the result that children often already seem to have a perception of a set in structures. Unconscious perception can be brought into the focus of attention and thus becomes conscious. Already known patterns (for example the dice pattern of the four) can be used and extended. Here it is important to talk, discuss and argue repeatedly with the children, i.e., to verbalize this perception. In mathematics education, there is also a broad consensus that mathematics education in kindergarten should take place in meaningful and playful natural learning situations (Gasteiger 2015). As a prerequisite for a playful exploration and development of a structural perception and use, appropriate materials are important (e.g., eggs and egg cartons). In most cases, many of these materials are already available in the kindergartens and it is a matter of awakening the awareness of structures in the kindergarten teachers on the one hand, and of finding a suitable way to communicate about structures on the other hand. Communication about structures, not only about spatial structures, can help to build a first awareness of structures und their use. Further research is needed to find out which communication tools are helpful to foster this.