1 Introduction

Spatial reasoning is seen as a crucial cognitive ability. We need it to find our way, manipulate with objects, and imagine situations. Spatial skills are also foundational in many scientific disciplines and the daily work of technologically skilled workforces (National Research Council 2006). Moreover, many studies have shown that at all levels of schooling spatial skills are essential for learning processes in science, technology, engineering, and mathematics (e.g., Wai et al. 2009) and that spatial education should start at a young age (Newcombe 2010; Verdine et al. 2014).

The National Council of Teachers of Mathematics (NCTM 2000, 2006; Goldenberg et al. 2014) also underlines this emphasis on early spatial education by highlighting that teaching geometry—the mathematics domain that is mostly related to spatial ability—should not start with plane geometry focusing on naming shapes and knowing attributes of shapes, which was for long the typical way of teaching geometry (Clements 2004), but on spatial reasoning. From kindergarten on, children should be given learning opportunities to further develop their spatial skills. In fact, this approach to geometry starting with spatial geometry instead of plane geometry was already argued by Freudenthal (1973).

In the NCTM (2000) Standards for geometry in K-2 grades, attention to spatial skills means that much attention is paid to specifying locations (including interpreting relative positions in space) and using visualization (including creating mental images of geometric shapes using spatial memory and spatial visualization and recognizing and representing shapes from different perspectives). Similar approaches can be found in curricula in England (Department for Education 2013) and Australia (Board of Studies New South Wales 2012). They are also part of the mathematics curricula of the Netherlands (Van den Heuvel-Panhuizen and Buys 2008) and Cyprus (Cyprus Ministry of Education and Culture 2010).

Despite the relevance attached to the development of spatial reasoning skills in young children there are still questions about how they develop. The present study had as its goal to contribute to gaining more insight into this development. The focus was on kindergartners’ (4–5 years of age) ability to mentally take a particular point of view, which we call ‘imaginary perspective-taking’ (IPT).

2 IPT and its development in early childhood

2.1 Piaget’s work

Seminal work on IPT has been undertaken by Piaget and Inhelder (1956). One task they used to investigate children’s IPT was the well-known Three Mountains task. In this task a 3D model of three mountains of different height is placed on a table. The interviewed child is sitting at the table together with a doll which is placed at a different position. The child is asked to explain what the doll can see. It was found that children up to the age of nine or ten were not able to take a perspective from another position, which Piaget ascribed to children’s egocentrism. However, a review by Newcombe (1989) of subsequent research revealed that many studies rejected these age norms, showing that children can overcome their egocentrism early in the preschool years. This was, for example, confirmed by a later study of Newcombe and Huttenlocher (1992), providing evidence that even 3-year-olds could solve perspective-taking problems. Even so, it was also found that these shifts from an egocentric to an allocentric reference frame showed considerable variability as a function of stimuli and testing condition (Newcombe et al. 2013).

2.2 Definition of spatial ability and its subcomponents

Despite the long research tradition there is still no clear consensus on the definition of spatial ability and its subcomponents (Hegarty and Waller 2004). Newcombe et al. (2013) suggest there are enough reasons to assume that in essence there are two kinds of spatial skills: between-object representation and transformation skills; and within-object representation and transformation skills. Newcombe et al. (2013) consider a perspective-taking task part of the first kind of skill, while a mental rotation task is seen to fit the second. This means that IPT and mental rotation seem to be distinct abilities (Zacks and Michelon 2005). However, although these are dissociable skills, Hegarty and Waller (2004) also indicate that responses in perspective-taking tasks and mental rotation tasks are often found to be correlated. Furthermore, these authors emphasize that self-rotation and object-rotation can both lead to the same result. Nevertheless, they acknowledge that these two tasks are not equivalent in difficulty. In general, mental self-rotation has been repeatedly reported as easier than object rotation (Kessler and Thomson 2010). More about mental rotation, especially about 2D and 3D rotations, can be found in Bruce and Hawes (2015).

2.3 Two types of IPT

The ability of IPT, that is, mentally representing a viewpoint different from one’s own, can itself be divided into subcomponents. Following Masangkay et al. (1974), Flavell et al. (1981) proposed and validated a distinction for these subcomponents into two abilities of perspective-taking. Both were tested by questions about cards placed between the experimenter and the child, where the child had to take the experimenter’s perspective. The so-called Level 1 competence concerns the visibility of objects: it implies the ability to deduce which objects are visible or not from the other viewpoint. The Level 2 competence relates to the appearance of objects: it implies the ability to indicate how an object looks when it is seen from a different viewpoint. Hughes (1975, as cited by Donaldson 1980) has independently proposed a very similar model, in which “projective” and “perspective” abilities correspond respectively to Level 1 and Level 2 competence.

According to Flavell et al. (1981) both IPT competences are acquired by children as young as 5 years of age, thus challenging Piaget’s claim that young children’s egocentrism interferes in their ability to take a different perspective than their own. Specifically, 3-year-olds performed well on Level 1 tasks but had difficulties with Level 2 tasks, even after a brief training. Usually this Level 2 competence is attained around four or 5 years of age (Pillow and Flavell 1986). This difference in development is rather plausible if the cognitive demands of the two types of tasks are taken into account.

2.3.1 Level 1 competence

Although the Level 1 competence only entails whether an object can be seen from the perspective of another observer, not involving the how aspect, this competence does imply a transformation of one’s own perspective into the perspective of another observer and the ability to make non-egocentric inferences about the visual experiences of this other observer. According to Salatas and Flavell (1976), this perspective transformation and making non-egocentric inferences mean that a child knows a given observer at a particular position only has one view (“1 observer, 1 unique view”), and a given view cannot be seen from more than one position (“different positions, different views”).

To determine whether an object is visible, a possible strategy is to imagine oneself in the other position, projecting an observer’s line of sight and verifying whether the target object meets with this line (Yaniv and Shatz 1990). However, Michelon and Zacks (2006) suggested that this Level 1 competence does not require a transformation of one’s egocentric reference frame, but the line-of-sight tracing might be an imaginary process analogous to mental scanning (or visualization) as if an actual line is drawn between the other observer and the target object.

2.3.2 Level 2 competence

In contrast with the Level 1 competence in which the child “only” has to decide whether an object is visible or not, the Level 2 competence means that a child also has to deal with multiple aspects of the visual appearance of an object, including features such as size, shape, and location, and has to understand that these features differ when an object is seen from different perspectives (Pillow and Flavell 1986). In other words, Level 2 competence involves applying specific knowledge about how changes in the observer–object relationship influence aspects of the appearance.

As shown in a study by Flavell et al. (1980), 4-year-old children could successfully judge that a viewer nearer to a small object could see it better (more clearly and precisely) than an observer farther away. Furthermore, 4-year-old children understood both from their own perspective and another viewer’s perspective that when objects are farther away they look smaller, and that when they are closer they appear larger (Pillow and Flavell 1986). Three-year-old children failed to respond to these tasks (Pillow and Flavell 1986; Flavell et al. 1980) indicating their limited awareness of projective size–distance relationship. Pillow and Flavell (1986) also found that 4-year-old children understood how the orientation of an object should be modified to make it look more circular or more elliptical both from their own perspective and another viewer’s perspective, indicating a clear understanding of the projective shape as a function of the object’s orientation in relation to the observer’s line of sight. Three-year-old children again failed to respond to this task, indicating their limited awareness of projective shape–orientation relationship.

In a recent study by Frick et al. (2014), 4- to 8-year-olds’ performance in IPT was measured by showing them photographs of scenes where a toy photographer takes a picture of a layout of objects (e.g., a cone and a cylinder) from a particular angle. Then the children had to choose which of four pictures was the one taken from the viewpoint of the photographer. Compared with previous studies that showed that IPT Level 2 competence is attained around four or 5 years of age (Pillow and Flavell 1986), the findings of this study—maybe as a result of how the IPT tasks were operationalized—were slightly different. The 4-year-olds responded as expected near chance level, and the children around the age of six performed better, but only at the age of seven to eight did egocentric responses decrease markedly, and even for the 8-year-olds considerable individual variability was found.

3 Relation between IPT competence and other child characteristics

3.1 Mathematics ability

A large body of knowledge provides evidence for a strong relationship between spatial and mathematics abilities. People who have better scores on spatial ability tests generally also perform better on tests of mathematics ability (see for an overview Mix and Cheng 2012). Moreover, these relations do not only apply to overall mathematics scores and those mathematical domains that are ostensibly spatial. For example, relations for mental rotation were obtained not only with geometry, but also with word problems and mental arithmetic (Kyttälä and Lehto 2008). Spatial visualization tasks requiring mental transformations of 2D and 3D objects were found to correlate not only with measurement skills (Casey et al. 2011), but also with students’ performance on arithmetic word problems (Hegarty and Kozhevnikov 1999) and counting skills (Kyttälä et al. 2003).

Although most studies on the relation of spatial abilities and mathematics performance focused on older children and adolescents, it has also been shown that at age three, children’s ability to count, represent number, and do simple addition and subtraction problems is related to their spatial skills (e.g., Gunderson et al. 2012; Verdine et al. 2014). Moreover, in line with the found concurrent and predictive relationship between spatial skills and mathematics achievement, there is evidence that enhancing children’s spatial skills can contribute to improving their mathematical skills (Newcombe 2010; Cheng and Mix 2014).

On the relation between the performance in perspective-taking and mathematics ability, the only study found in the review by Mix and Cheng (2012) was that by Guay and McDaniel (1977). This study investigated whether the ability to coordinate multiple viewpoints was related to children’s performance on the Iowa Mathematics Tests of Basic Skills. In the perspective-taking task the children were seated at a round table and had to observe a particular three-dimensional geometric object (e.g., cube, pyramid). Then they were asked to imagine themselves sitting in another place and select from three line drawings projected on a screen the drawing representing the object’s appearance from the specified viewing position around the table. Interestingly, a significant positive relation between perspective-taking and mathematics ability was found in grades 5, 6, and 7, but in grades 2–4 the relation was not significant.

3.2 Gender

Findings on the relation between gender and spatial ability are mixed. For example, Lachance and Mazzocco (2006) in a longitudinal study on children in lower primary school did not find sustainable gender differences in spatial ability, but many other studies provided evidence that males on average perform better than females on many spatial tasks. In particular for mental rotation tasks, this male advantage is highly robust (e.g., Voyer et al. 1995; Geary et al. 2000).

These gender differences in spatial skills, which are mostly obtained for older students and adults, also appear at a younger age. Even five-month-old male infants show more evidence of mental rotation than female infants (Moore and Johnson 2008). For kindergartners several studies also found a male advantage for 3D mental rotation (Casey et al. 2008), 2D mental rotation as well as translation (Levine et al. 1999), and spatial visualization (Tracy 1987). However, Ehrlich et al. (2006) found that though boys performed better than girls at age 5 on spatial transformation tasks including mental rotations, both boys and girls improved with training.

Also the testing condition has shown to be of influence on gender differences in spatial skills. For example, Horan and Rosser (1984) investigated children aged 4, 6, and 8 years who were offered dimension-transcending tasks in which the questions and the answers were formulated in a different dimension: a three-dimensional object was shown and the child was asked how this object looked to an observer in another position after the object was rotated 90º. The child could answer the question by selecting the correct two-dimensional picture. In these dimension-transcending tasks, boys performed better than girls. Nevertheless, if both question and answer were presented in the form of two-dimensional pictures, girls performed better than boys.

Studies specifically focusing on gender differences in perspective-taking are scarce. In a study by Liben (1978) on the perspective-taking skills of children from 3 to 7 years of age, no effects of gender were found in the children’s performance on perspective-taking tasks requiring the selection of pictures illustrating how block arrays appeared to them and to an experimenter positioned opposite them. Similarly, Newcombe and Huttenlocher (1992) found no gender differences for perspective-taking in their study in which they asked young children which object would be in a certain position relative to another observer.

3.3 Cultural aspects

Although Berry (1971) suggested that the development of spatial thinking is related to various aspects of learners’ culture, such as physical environment, language, and social practices, cognitive science literature also widely assumes that spatial thinking varies minimally across cultures. This interpretation aligns with, for example, the viewpoint of Montello (1995), who pointed out that many substantial aspects of spatial cognition are common to people world-wide, including (1) similarities in the organization of the human nervous system, (2) common structures and processes of the human body, (3) similarities in learning and socialization, and (4) similarities in the residential environments of people. Yet Montello (1995, p. 496) also suggested that there are important cultural differences in spatial cognitive structures and processes which, however, “occur primarily between traditional and technologically-developed cultures, not between different technologically developed cultures”.

This latter view is more in agreement with recent studies revealing the existence of culture-specific spatial schemes in gestures (Kita et al. 2001) and culture-specific differences in the spatial interpretation of time (Núñez and Sweetser 2006). Mitchelmore (1980) further indicates that cultural and environmental factors can influence spatial ability, finding significant differences in 3D drawing ability between West Indian, American, and English children.

Besides the influence of cultural aspects in general on spatial abilities, the formal education learners receive, specifically the teaching approaches, can significantly affect the development of spatial skills (Bishop 1980; Newcombe 2010). For example, in an investigation into the improvement of spatial ability in children from kindergarten through first grade, Huttenlocher et al. (1998) found that children’s spatial thinking improved to a greater extent during the school year than over the summer period, indicating that experiences in school contributed to the development of children’s spatial skills. Furthermore, in a study by Bishop (1973), children from primary schools which focused on the use of manipulatives demonstrated higher performance in spatial ability tests than children from primary schools which did not use materials. In other words, when countries differ in their teaching approaches this might affect children’s spatial abilities.

Regarding spatial perspective-taking, cultural-related differences have received limited attention. In a study conducted several years ago Knudson and Kagan (1977) investigated the ability to take the perspective of another in two age groups of children, 5 to 6 and 7 to 9 years of age, with an Anglo-American and Mexican-American cultural background, respectively. The findings showed no significant cultural differences in children’s perspective-taking performance.

4 Research questions

The present study was set up to further explore the spatial abilities of kindergartners and gain more insight into their performance in imaginary perspective-taking (IPT). The focus of this study was on the two subcomponents of IPT distinguished by Masangkay et al. (1974) and Flavell et al. (1981): the abilities to imagine what is visible from a particular point of view (IPT type 1: visibility) and how an object or scene will look from a particular point of view (IPT type 2: appearance).

Considering that task factors (Flavell et al. 1981) and testing conditions (Newcombe et al. 2013) can very powerfully affect children’s tendency to express competences and that most studies on IPT so far used tasks that included three-dimensional displays only, or three-dimensional displays in combination with two-dimensional representations (photographs, drawings), it would be interesting to know how young children can deal with two-dimensional representations of three-dimensional situations. Therefore, to add to the existing knowledge on young children’s IPT ability, in the present study children’s IPT performance was assessed with two-dimensional representations only for both the question and the possible answers.

Furthermore, given the small number of recent studies into kindergartners’ IPT competences and their few or incongruent results about how IPT relates to children’s characteristics, we also investigated whether characteristics including time spent in kindergarten, gender, and mathematics ability influence children’s IPT performance. We wanted to explore as well whether there are cross-cultural patterns in the IPT performances by children in different countries. In light of the above, the following research questions guided our study:

  1. 1.

    How do kindergartners perform when they have to solve IPT type 1 items (visibility) and IPT type 2 items (appearance)?

  2. 2.

    How is kindergartners’ performance in these two types of IPT items related?

  3. 3.

    Is there a relationship between kindergartners’ performance in the two types of IPT items and their kindergarten year, gender, and mathematics ability?

  4. 4.

    Do the answers to the previous questions differ for children in different countries?

5 Methods

5.1 Set-up of the study

To address the research questions a survey was carried out in the Netherlands and in Cyprus. In this survey we assessed kindergartners’ performance in IPT by administering two test booklets, each with items about imagining visibility and about imagining appearance.Footnote 1

5.2 Participants

5.2.1 The Netherlands sample

The participating children in the Netherlands were from kindergarten classes in primary schools in the province of Utrecht. To limit differences in teaching methods, schools with a specific educational approach, such as Montessori schools or Peter Petersen schools, were excluded. All 18 schools that took part in the study had integrated kindergarten classes with both first-year (K1) and second-year kindergartners (K2). Each school participated only with one class. The total sample in the Netherlands included 384 kindergartners. Children who did not complete both test booklets were excluded from the analysis, reducing the Netherlands sample to 334 children, 176 girls and 158 boys; 123 children were in K1 and 211 children were in K2. The K1 children had an average age of 4.67 years and the K2 children were on average 5.69 years old (Table 1).

Table 1 Sample composition

The children’s mathematics ability was based on their score on a test developed by the Central Institute for Test Development (Cito). This test is widely used in schools in the Netherlands to monitor children’s development in mathematics. It has different versions for K1 and K2 children.Footnote 2 Based on national reference samples of K1 and K2 children the test scores were converted into a mathematics ability for each grade separately ranging from mathematics Level 4 (the highest level) to Level 1 (the lowest level) each containing 25 % of the scores in these reference samplesFootnote 3 (see Table 2). We also tested whether there were grade differences for mathematical ability. In the Netherlands, there was no difference between K1 and K2 children (K2: M = 2.90, K1: M = 2.96, t(322) = −0.45, p = .65, d = −.05).

Table 2 Children’s mathematics ability

5.2.2 The Cyprus sample

The participating kindergartners in Cyprus were also from primary schools with kindergarten classes. The ten schools involved were situated in the province of Nicosia. All schools follow the common regular curriculum, proposed by the Cyprus Ministry of Education and Culture. Four schools had integrated kindergarten classes with both first-year (K1) and second-year kindergartners (K2), while six schools had K1 and K2 children in separate classes.

Also different from the Netherlands sample, the schools in Cyprus participated with more than one class, for a total of 23 classes involving 364 kindergartners. Children who did not do both test booklets were excluded from the analysis, which reduced the Cyprus sample to 304 children, 163 girls and 141 boys; 86 children attended K1 and 218 children were in K2. The K1 children had an average age of 4.67 years and the K2 children were on average 5.61 years old (see Table 1).

As there is no mathematics test in Cyprus that is widely used in schools to assess children’s development, the mathematics ability level of the kindergartners was based on teachers’ perceptions of their students’ level in mathematics. Specifically, every teacher whose class participated in the study was asked to categorize the K1 and K2 children of her class into four levels of mathematics ability: Level 4 was meant for the children who, taking the national populations of respectively the K1 and the K2 children as a reference, belong to the 25 % highest scoring children in mathematics, Level 3 for the next 25 %, Level 2 for the next 25 %, and Level 1 for the children who at a national level belong to the 25 % children with the lowest mathematics ability. The distribution over the four mathematics ability levels of the children in the Cyprus sample is shown in Table 2. In Cyprus, K2 children were judged of higher mathematical ability than K1 children (K2: M = 3.22, K1: M = 2.94, t(297) = 2.69, p < .01, d = .35).

5.3 The IPT items

To assess how able kindergartners are in IPT, a series of pictorial paper-and-pencil items was developed (see “Appendix” for the complete series of items),Footnote 4 including seven referring to IPT type 1 (visibility) and six items about IPT type 2 (appearance). For example, the Duck item (IPT1Duck; see “Appendix”) is used to measure IPT type 1. In this item the children were asked what the duck, which has fallen into the hole, sees when he looks up. The Soccer item (IPT2Soccer; see “Appendix”) measures IPT type 2. Here the children had to determine how the scene on the soccer field looks from above. All items have a multiple-choice format and each covers one page with an illustration of the problem situation and four small drawings representing the possible answers. After a test item was read aloud in class, the children had to answer by underlining the drawing representing the correct answer.

Before the test was used for the data collection in our study, the items were piloted, leading to a revision of some items to make the wording and drawings clearer. The final versions of the items were split up over two booklets to be administered on different days with a 1-week interval. The data collection was carried out by trained test administrators both in the Netherlands and in Cyprus. Correct responses were coded as 1, and incorrect ones as 0.

In the Netherlands sample, there was a rather low item-total correlation both for the IPT type 1 items and for the IPT type 2 items, meaning that the items differed considerably (IPT1: M = .46, SD = .08; IPT2: M = .44, SD = .05). In the Cyprus sample, the item-total correlations were also rather low (IPT1: M = .45, SD = .07; IPT2: M = .42, SD = .03). In addition we calculated the reliability of the sets of IPT items by using the omega measure (ω), which is generally seen as less biased than the Cronbach’s alpha (see Revelle and Zinbarg 2009). For the sample in the Netherlands, the reliability of the IPT type 1 items was ω = .52 and for the IPT type 2 items this was ω = .26. For the sample in Cyprus, these values were ω = .44 and ω = .25 respectively. The found reliabilities are below the often used minimal criterion of .70. However, given the small number of items and the heterogeneous nature of IPT, such low reliabilities can be expected (Cortina 1993).

6 Results

6.1 Kindergartners’ performance in the two types of IPT items

6.1.1 Kindergartners’ performance

Table 3 shows that for the total sample of kindergartners in the Netherlands the visibility items (IPT type 1) (M = .70) were generally easier than the appearance items (IPT type 2) (M = .41). This difference was found to be significant, t(333) = 21.98, p < .001. In the Cyprus sample the visibility items were also significantly easier (M = .58) than the appearance items [M = .32, t(303) = 16.82, p < .001].

Table 3 Mean score for each item for the whole sample and each kindergarten year in the Netherlands and in Cyprus; differences in mean scores between kindergarten years in both countries and differential item functioning between both countries (country DIF)

For both types of items, kindergartners in the Netherlands performed better than those in Cyprus, visibility: NL: M = .68 and CYP: M = .56, t(636) = 7.45, p < .01, d = .60; appearance: NL: M = .39 and CYP: M = .33, t(636) = 3.69, p < .01, d = .29.Footnote 5 Despite the foregoing, the correlations of the item difficulties between the Netherlands and the Cyprus sample were rather high, visibility: r = .73; appearance: r = .75. This means that the rank order of the difficulty level of the items is quite similar in both countries.

To further examine whether there are differences between the performances of kindergartners in the Netherlands and in Cyprus we inspected the differential item functioning (DIF). For both IPT types we found marginally significant country DIFs.Footnote 6 For the visibility items, the country mean difference was slightly higher than the difference in the average country means of all items, resulting in a relative advantage for the kindergartners in the Netherlands (DIF = .02, p = .06). The reverse was true for the appearance items, resulting in a relative advantage for the kindergartners in Cyprus (DIF = −.02, p = .06).

In the Netherlands, the visibility items IPT1Wall and IPT1Hole were the easiest. Probably this is due to the context of the items, which may be quite familiar for children, because in their play activities it may have occurred often that due to their limited height they cannot see what is beyond a high wall. Also for the situation with the hole in the door children might have experienced what is the best place to see the most. In both items the watching subjects are featured in the drawing, which is not the case in IPT1Umbrella, the hardest visibility item in the Netherlands sample.

For the appearance item IPT2Fence, the kindergartners in the Netherlands had the highest proportion of correct answers. In this item the children had to find the proper position of the bird behind the fence to see the bird in a particular way. As for IPT1Hole, the children might have succeeded in solving this item because of experiences with peeking through a crack in a fence. In contrast, IPT2Table seemed much more complex even though the drawing includes the watching girl and the appearance of the table; the children only have to find the proper position of the girl to see the table in that particular way. Apparently, this is more difficult than finding the position of the observed bird in IPT2Fence, which only varies with respect to the distance from the fence. The position of the girl in IPT2Table does not vary as a function of distance, but as a function of her spatial location with respect to the table.

Similar to the Dutch results, for the kindergartners in the Cyprus sample, IPT2Fence was the easiest appearance item and IPT2Table the most difficult. Furthermore, IPT1Wall and IPT1Hole were also the easiest visibility items for the children in Cyprus, as well as IPT1Tower which was also easy for the Dutch kindergartners. However, in the Cyprus sample the most difficult visibility item was not IPT1Umbrella but IPT1Duck, which in the Netherlands had a mean score of .60 while in Cyprus this was .34. This finding was confirmed by a significant country DIF (.17, p < .01).

6.1.2 Kindergartners’ performance in K1 and K2 separately

Similar to the results in the total sample of each country we also found for the two kindergarten years that the visibility items were significantly easier than the appearance items. The findings for the Netherlands sample for K1 were: visibility M = .62, appearance M = .32, t(122) = 14.56, p < .01; and for K2: visibility M = .74, appearance M = .46, t(210) = 16.60, p < .01. In the Cyprus sample we found for K1: visibility M = .51, appearance M = .32, t(85) = 6.32, p < .01; and for K2: visibility M = .61, appearance M = .34, t(217) = 16.40, p < .01.

Moreover, the Dutch kindergartners were outperforming the kindergartners in Cyprus. In the visibility items this was the case for both kindergarten years: K1 [NL: M = .62 and CYP: M = .51, t(207) = 3.68, p < .01, d = .52], K2 [NL: M = .74 and CYP: M = .61, t(427) = 7.39, p < .01, d = .71]. In the appearance items we only found in K2 a significant outperformance: K1 [NL: M = .32 and CYP: M = .30, t(207) = −0.15, p = .88, d = −.02], K2 [NL: M = .46 and CYP: M = .33 t(427) = 6.48, p < .01, d = .63]. But again the correlations of the item difficulties between the Netherlands and the Cyprus kindergartners in K1 and K2 were moderately correlated for both types of items, visibility: K1 (r = .54), K2 (r = .80); appearance: K1 (r = .69), K2 (r = .62). This means that the rank order of the difficulty level of the items is similar in both countries also in the two kindergarten years.

Comparing the scores in the two kindergarten years for each of the two item types, we found for the total score in the visibility items in both countries that the K2 children significantly outperformed the K1 children, NL: M K2 − M K1 = .12, t(332) = 5.83, p < .01, d = .66; Cyprus: M K2 − M K1 = .09, t(302) = 3.65, p < .01, d = .46. However, for the total score in the appearance items this was only so for the Netherlands sample and not for the Cyprus sample, NL: M K2 − M K1 = .14, t(332) = 6.71, p < .01, d = .76; Cyprus: M K2 − M K1 =.02, t(302) = 0.80, p = .43, d = .10). In the Cyprus sample there was even one appearance item in which the K1 children clearly outperformed the K2 children, that is, IPT2Mouse (M K2 − M K1 = −.18). However, after excluding this item from the appearance items to compare grades in the Cyprus sample, there was a significant difference between the two kindergarten years [M K2 − M K1 =.06, t(302) = 2.18, p = .03, d = .28].

6.2 Relationship between the two types of IPT items

To investigate the relationship between the two types of IPT items we first calculated the correlation between the scores for these two types of items. In the Netherlands sample we found a manifest correlation of r = .25. A correction for attenuation using omega reliabilities resulted in a latent correlation of r lat  = .68. For the Cyprus sample the manifest correlation was r = .18 and the latent correlation was r lat  = .54.

Next we explored the relationship between the individual items of both types by performing a statistical implicative analysis for each country (Lahanier-Reuter 2008) using the CHIC (Classification Hiérarchique, Implicative et Cohésitive) software (Bodin et al. 2000). The statistical implicative analysis can reveal whether an item implies another item, which means that if we observe in a subject success in the former item generally we also observe in this same subject success in the latter item. The results of the statistical implicative analyses are given in diagrams, which show graphically how the items are related. The direction of an arrow specifies the found implicative relationship. We only included implicative relationships which have at least an 85 % probability of being identified correctly.

The results for the Netherlands sample are shown in Fig. 1 on the left. The implicative diagram shows that for 11 out of 13 items implicative relations were found; no relations could be identified only for IPT1Umbrella and IPT2Tree. In general the implicative diagram shows that children’s success in an appearance item generally implies success in a visibility item. There is one exception to this pattern, which is the relationship between IPT1Duck and IPT2Fence. Specifically, the students who in IPT1Duck could imagine what the duck sees from a hole in the ground when looking up, could also determine in IPT2Fence at what distance from the fence the appearance of the bird would be as depicted. This means that although IPT2Fence is an IPT2 item, it appeared to be less complex than the IPT1 item IPT1Duck. Maybe the higher complexity of IPT1Duck has to do with the unusual direction of looking. Looking up from below might be more difficult to imagine than the peeking through a hole horizontally which is required for IPT2Fence. The difficulty of this looking-up perspective might also explain why IPT2Table was so difficult.

Fig. 1
figure 1

Implicative diagram of the IPT type 1 items and the IPT type 2 items based on the responses of the kindergartners in the Netherlands (left) and Cyprus (right). a M = Proportion correct answers. bProbability of a correctly identified implicative relationship; ºp > .85, *p > .90, **p > .95. cConditional probability; for example, .70 means: of the 21 % students who answered IPT2Table correctly 70 % answered IPT1Duck correctly

A closer look at the implicative relations of the appearance items further supports the lower level of complexity of IPT2Fence. Specifically, children who successfully solved IPT2Table, IPT2Mouse, IPT2Soccer, and IPT2Cucumber performed well in IPT2Fence. Another revelation from the implicative analysis is that success in IPT2Mouse implied success in IPT2Soccer, indicating that the former was more complex than the latter. In both items children were asked to take a bird’s eye view to visualize how something would look from above, but what differentiates the items is that in IPT2Soccer (M = .41) the possible answers all include the components of the soccer scene from a bird’s-eye viewpoint—the children only have to decide about the components’ positions in space—while in IPT2Mouse the front view of the body of a sitting mouse was shown with as possible answers a mouse in different perspectives and in different postures. As a result, the proportion of children (M = .44) who selected the drawing showing the whole mouse was greater than the proportion of children (M = .28) who selected the correct drawing in which the body of the mouse was not visible.

The implicative relations within the visibility items show that children’s success in IPT1Basket and IPT1Crossing implies success in IPT1Hole, IPT1Tower, and IPT1Wall. This could mean that some visibility items include characteristics making them cognitively more complex than some other visibility items, or that the context of some of these items is less close to children’s everyday life than that of other visibility items. For example, as explained above, IPT1Hole is quite familiar and cognitively easier and as a result the children understand well that the closer you are to the door hole, the more you can see through it. In addition, an item characteristic that may facilitate children in taking a different perspective in visibility items is including a panoramic view in the drawing, which applies to IPT1Tower and IPT1Wall. By contrast, a view taken from above, as for the IPT1Crossing, seems to hinder children in taking a particular perspective. The position of an obstacle in the sight line might also increase the item complexity. For example, in IPT1Basket, though the basket is between Jip’s eyes and the ball, many children (M = .30) did not seem to consider it an obstacle for her to see the ball.

The results for the Cyprus sample are shown in Fig. 1 on the right. Again for 11 of the 13 items implicative relations were found, but here the items for which no relations could be identified were IPT2Mouse and IPT2Soccer. The implicative diagram based on the Cyprus sample also shows that children’s success in an appearance item implies success in a visibility item. Similarly to the implicative diagram of the Netherlands sample, one implication is not in line with this pattern; that is, success in IPT1Duck implies success in IPT2Fence.

For the visibility items, we found in the Cyprus sample that IPT1Umbrella, IPT1Duck, and IPT1Basket imply success in IPT1Hole, IPT1Tower, and IPT1Wall, indicating that the former items are more complicated than the latter ones. For IPT1Duck and IPT1Basket, which implies success in the other visibility items in the Dutch sample also, we already gave a reason for their complexity. As for IPT1Umbrella, which is not in the Netherlands implicative diagram, its complexity may be a consequence of the bird’s-eye view that is required as well as the absence of a looking subject.

In sum, the implicative diagrams for the data of the Netherlands and the Cyprus samples show a similar pattern for the relationship between the two types of IPT items. In both diagrams, most of the visibility items are placed below the appearance items, indicating that success in the former items implies success in the latter items. However, certain other characteristics of the items, for example an unusual direction of looking as in IPT1Duck, may reverse this implicative relation.

Another common pattern between the two diagrams appears also in the implicative relations within the visibility items. In both diagrams, some visibility items have features making them cognitively more complex than other visibility items, such as a particular direction of looking (from above as well as from below) and having an obstacle between the viewer and the object to be seen. However, implicative relations for the appearance items were only found in the Netherlands sample; for example, as discussed earlier, the implicative relation between IPT2Mouse and IPT2Soccer. A possible reason for not finding implicative relations within the appearance items in Cyprus might be the children’s rather poor performance in almost all of these items.

6.3 Relationship between children’s characteristics and their performance in the two types of IPT items

To investigate a possible relationship between performance on the IPT items and kindergartners’ characteristics including kindergarten year, gender, mathematics ability, and country (predictors), as well as within the two types of IPT, we carried out a two-level regression analysis in which the two IPT types of a child are nested within a child. The regression model was specified as a linear mixed effects model in the lme4 software (Bates et al. 2014) and estimated for each country separately. The predictors IPT type (IPT), kindergarten year (Year), and gender were included as contrast variables (coded with ½ and −½, respectively) representing the difference between IPT type 2 and IPT type 1, K2 and K1, and girls and boys. Hence, regression coefficients indicate the difference for the respective groups. Mathematics ability (Math) was used as a linear predictor in the regression model and the value of 2.5 (the middle level of the ability levels 4, 3, 2 and 1, see Table 2) was subtracted from the original values. Table 4 shows the results of the regression analysis.

Table 4 Regression analysis predicting IPT scores of the Netherlands (N = 334) and Cyprus samples of kindergartners (N = 304)

For the Netherlands sample, 39.5 % of the total variance in the kindergartners’ IPT scores was explained by the predictors included in the regression model (R 2 = .395). Most of the explained variance can be attributed to the predictor IPT type (IPT: B = −.28, SE = .02, t = −18.40, p < .001, η2 = .353), indicating that the kindergartners’ performance in the IPT type 2 items was considerably lower, namely 28 percentage points, than in the IPT type 1 items. Also kindergarten year appeared to be a significant predictor in the regression model (B = .04, SE = .02, t = 2.01, p = .044, η2 = .001), while this was not the case for gender (B = .00, SE = .02, t = −0.15, p = .879, η2 < .001). However, mathematics ability was significantly related to IPT performance (B = .06, SE = .01, t = 7.05, p < .001, η2 = .036), meaning that for two children with a difference of one in mathematics ability level, there was an average difference of 6 percentage points in item success rates. Furthermore, we found a significant interaction effect only for IPT type and mathematics ability (B = −.03, SE = .01, t = 2.02, p = .043, η2 = .003), meaning that the influence of mathematics ability was stronger on the success rate for IPT type 1 items than for IPT type 2 items.

In the Cyprus sample, the explained variance was a bit lower than in the Netherlands sample (R 2 = .311). Similar to the Netherlands, the IPT type 2 items were significantly harder than the IPT type 1 items (B = −.22, SE = .02, t = −11.42, p < .001, η2 = .277). Kindergarten year and gender were not significant predictors. Yet, similar to the Netherlands sample, children’s mathematics ability—which was only based on the teachers’ perceptions and not measured by a standardized test—significantly predicted their IPT performance (B = .04, SE = .01, t = 4.02, p < .001, η2 = .020). Moreover, there was a significant interaction effect of IPT type and gender (B = .08, SE = .03, t = 2.65, p = .008, η2 = .007), indicating that the difference between the two IPT types was smaller for girls. The average difference between IPT type 1 and IPT type 2 amounted to 22 percentage points; for girls the difference was 18 percentage points (=−22 + 8 × ½) and for boys it was 26 % points (=−22 + 8 × −½). The interaction of IPT type and mathematics ability was marginally significant and the other interaction effects were not significant.

7 Concluding remarks

Our study revealed that kindergartners in the Netherlands and Cyprus answered on average respectively 70 and 55 % of the visibility items correctly, and 40 and 30 % of the appearance items correctly (Research question 1). For the visibility items our findings are more or less in agreement with Flavell et al. (1981), but not for the appearance items. For these items, our results are more in line with Frick et al. (2014) who found that 4-year-olds responded near chance level. However, there is one restriction to be taken into account when comparing our results with these other results. In our study we worked with drawings representing the objects and the environment in which the objects (and sometimes also the observer) were situated, while the other studies situated the perspective-taking tasks in concrete situations mostly with physical objects.

The aforementioned differences between the children’s performances in the visibility items and the appearance items make it clear that development of the IPT type 1 competence (visibility) precedes the IPT type 2 competence (appearance) (Research question 2). This is confirmed by the identified implicative relationships among the items, showing that children’s success in an appearance item generally implied success in a visibility item. However, the implicative analysis also revealed that the difficulty level of an item might differ as a result of specific item characteristics, including cognitive demand (direction of looking; spatial location of observing subject), representation (observing subject included in the drawing or not; bird’s-eye view or panoramic view), and context (familiar or not).

When investigating the relation between children’s characteristics and their performance in the two types of IPT items (Research question 3), our regression model showed that IPT type, in both the Netherlands and in Cyprus, was the most influential factor to predict the IPT scores of the children, resulting respectively in a 28 and 22 percentage points lower score for the IPT type 2 items than for the IPT type 1 items. Regarding the children’s characteristics, in both countries mathematics ability was significantly positively related to the kindergartners’ IPT performance, which extends to the findings of Guay and McDaniel (1977), who did not study children at kindergarten age and used physical objects in the perspective-taking tasks. For the Netherlands sample we found that the children in K2 significantly outperformed those in K1, while in Cyprus kindergarten year was not found to be a significant predictor of the IPT scores.

Results for gender were similar in both countries: in line with earlier studies (Liben 1978; Newcombe and Huttenlocher 1992), there was no significant effect on the IPT scores. In addition, in the Netherlands we found a significant interaction effect of IPT type and mathematics ability in the advantage of the IPT type 1 items, and in Cyprus there was a significant interaction effect of IPT type and gender, meaning that girls did relatively better than boys in IPT type 2.

Although the children in the Netherlands and Cyprus may have grown up in a culturally different environment (northern vs southern Europe), we can conclude that the findings in the two samples (Research question 4) generally were quite similar, in accordance with a study by Knudson and Kagan (1977). In fact, the main striking difference was that in the Netherlands the kindergartners performed higher on both IPT types than those in Cyprus. A first explanation could be that in the Cyprus sample most children were in separate classes, whereas in the Netherlands the K1 and K2 children were in integrated classes, which could imply the younger children learning from the older ones. A further explanation could be found in the kindergarten curriculum as indicated by studies of Huttenlocher et al. (1998) and Bishop (1973). However, in our study this is not the most obvious conclusion since in both countries spatial reasoning is assumed to be part of the mathematics program in kindergarten, which of course does not mean that this topic is adequately implemented by the teachers. Another explanation for the performance difference might be that the children in Cyprus are not familiar with class-administered paper-and-pencil testing.

This brings us back to the considerable variability found in IPT as a function of stimuli and testing condition (Newcombe et al. 2013). To overcome this limitation and obtain a more robust understanding of the ability of IPT, further research is needed in which the type of tasks and the presentation of IPT situations are systematically varied and more items are used to measure IPT. Furthermore, future studies should include a more in-depth analysis of possible differences in the cultural and educational environment of children.