Natural selection is a fundamental mechanism of evolution, the unifying principle of biology. It is central to understanding the functional specialization of living things, the origin of species diversity and the inherent unity of biological life. In turn, a comprehensive understanding of natural selection is critical for understanding and responding to some of the most pressing issues of our time, for example, the biological impacts of climate change. However, despite its importance, years of research indicate that natural selection remains one of the most misunderstood concepts in contemporary science (see Gregory 2009, for review). These misunderstandings often follow predictable patterns and show resistance to instruction, not only persisting among high school students and undergraduates–who are the usual recipients of comprehensive mechanistic teaching on evolution–but troublingly, also among many of the teachers expected to teach them (e.g., Nehm and Schonfeld 2007; Nehm et al. 2009; Rachmatullah et al. 2018).

Historically, discussion of the source of students’ misunderstandings about natural selection has tended to focus on socio-cultural or motivational factors, for example, poorly worded textbooks, media, or resistance to learning about evolutionary processes due to religious beliefs (e.g., Aldridge and Dingwall 2003; Jungwirth 1975; Rutledge and Warden 2000; Sinatra et al. 2008). More recently, however, developmental and learning scientists have suggested that these mistaken ideas may have roots in a more universal source, specifically, a suite of cognitive biases that routinely emerge, across cultural contexts, in early child development (see chapters in Rosengren et al. 2012). These cognitive tendencies have been found to affect children’s early reasoning about a diverse range of social, living, and non-living natural phenomena, and research strongly points to their role in older students’ evolutionary misunderstandings (e.g., Coley and Tanner 2012, 2015; Evans 2008; Kelemen 2012, 2019; Samarapungavan and Wiers 1997; Shtulman and Schulz 2008).

For this Special Issue, we focus on one of these cognitive biases, the teleological bias—the tendency to account for phenomena by reference to a putative function or purpose—and explore three primary questions. First, given the early emergence of intuitive biases that make natural selection hard to learn in adolescence and adulthood, can the basic mechanism of adaptation by natural selection be introduced far earlier, in elementary classrooms, before intuitive misunderstandings become sufficiently automatic that they undermine accurate learning and reasoning about the process of natural selection? Second, what kinds of preconceptions do children exhibit about adaptation during the early elementary period, prior to any instruction? Third, to the extent that children actually display explicitly teleological preconceptions, do these have differentially greater impacts on their subsequent learning about natural selection than other kinds of preconceptions?

Natural selection understanding and teleological misunderstandings in older students

Adaptation by natural selection is the cumulative population-based mechanism by which species evolve specialized traits. By virtue of random variation within a population, some individuals have heritable traits that are more functionally advantageous in an environment, and those traits come to predominate as those group members out-survive and out-reproduce others over multiple generations. Under this causal mechanism, the existing environmental functionality of a trait is key insofar as it increases the fitness of some individuals within a population and thus influences which traits are passed onto future generations. However, when older students are asked why a species has a specialized trait, most display non-mechanistic or mechanistically inaccurate views that reflect a fundamental misunderstanding of the role of function in natural selection. Frequently, students’ explanations instead converge on a general purpose-based or “teleological” pattern in which the current functionality of a trait—or a species’ need for that trait’s functionality—is stated as the only prerequisite that is required to explain why a species has evolved that property (e.g., Gregory 2009). Classic examples of these basic teleological explanations (TE) include claims like “giraffes evolved long necks so that they can feed from the tops of trees” and “anteaters have long noses because they needed them to suck up ants.” In these unelaborated forms of teleological misunderstanding then, there is no reference to an antecedent causal-historical mechanism.Footnote 1 At best there is mild invocation of an antecedent cause—specifically, a survival need (e.g., the need for food or defense)—as the condition driving biological change. In consequence, reference to the undirected, population-based mechanism by which evolutionary change actually occurs is absent.Footnote 2 It is as if students view natural selection as a goal-directed transformative event in which nature magically changes living things—or individual living things purposefully change themselves—so that the entities acquire heritable traits that allow them to secure their own survival (Chi 2009; Kelemen 2012; Shtulman 2006).

These unelaborated teleological misunderstandings may be basic, but they are challenging to overcome. In mis-categorizing natural selection as a goal-directed event rather than as a mechanistic process, students also tacitly focus on a trait’s functional benefit to individual animals rather than population variability as the engine of change. Furthermore, the tendency to construe individuals as essentially uniform in their functional needs—or in their capacities to transform in response to those needs—effectively shuts down the kinds of representations of variability that make a mechanistic understanding of natural selection possible (Emmons and Kelemen 2015; Shtulman 2006).

Having noted this, it is an open question whether these kinds of basic, unelaborated teleological misunderstandings represent the worst-case scenario for scientific learners. This is because some students not only possess these core misunderstandings but also actively expand upon these ideas and elaborate them with inaccurate mechanisms that explicitly reference goal-directed processes of change, even showing signs of borrowing from the domain of intention-desire psychology to explain biological outcomes. Examples of these elaborated forms of reasoning include: effort-based claims that giraffes evolved long necks because they stretched them as they tried to reach to the tops of trees and anthropomorphic or agency-based claims that Nature or Evolution changed giraffes so that they could survive (Ferrari and Chi 1998; Gregory 2009).

These more elaborated causal-mechanistic claims share all of the challenges of more basic teleological misunderstandings but, arguably, may be harder to conceptually restructure or suppress when learning accurate alternatives because they reflect more detailed misunderstandings of evolutionary process (Kelemen 2012). In this study, we were therefore interested in not only documenting the prevalence and correlates of different kinds of explicitly teleological preconceptions when children are engaged in teacher-led classroom learning but also in examining their relative impact on children’s capacities to learn and generalize from a storybook offering an accurate alternative. Tables 1 and 2 lay out a typology of these explicit elaborated rather than more basic unelaborated teleological misunderstandings. We also contrast them with common misunderstandings that may present different or reduced learning challenges because while they are somewhat mechanistically elaborated, they have no explicitly linguistically marked teleological component. As such, their teleological content is therefore more uncertain and ambiguous.Footnote 3

Table 1 Common misconceptions about natural selection
Table 2 Breakdown of focal misconceptions

The current study

In summary, the present study addressed three main questions. First, we examined whether it is viable to introduce children to the fundamentals of adaptation by natural selection using a teacher-led intervention in elementary school classrooms. In exploring this question, we built from prior research indicating that 7- to 8-year-old children can learn and generalize the theory of adaptation by natural selection from limited interventions that combine custom explanatory picture storybooks with talk aloud explanation requests. These earlier studies found that the storybook, How the Piloses Evolved Skinny Noses (Kelemen and The Child Cognition Lab 2017), can help children learn in the context of controlled, researcher-led one-on-one sessions (Kelemen et al. 2014; Emmons et al. 2016; Emmons et al. 2018). In the current research—for the first time—we instead explore children’s learning outcomes when the storybook intervention: (1) is teacher led; (2) occurs in authentic public school classroom settings that incorporate a hands-on simulation activity, and (3) involves written evaluation materials rather than individual talk aloud explanation protocols because the former are more suitable for widespread use in classrooms. Determining whether the storybook intervention and assessments lead to increased understanding of adaptation even with these changes is critical to understanding whether such an intervention is a promising scalable method for teaching the fundamentals of natural selection in early elementary school.

The second goal of this study was to better understand the nature of children’s preconceptions. Specifically, we investigated the prevalence of teleological reasoning in children’s explanations of adaptation by natural selection. Teleological reasoning emerges early in childhood and pervades much of children’s thinking (e.g., Kelemen 2004). For instance, young children not only often prefer teleological explanations over mechanistic explanations when reasoning about biological natural phenomena (e.g., Keil 1992) but also show this preference when reasoning about non-biological natural phenomena. That is, across cultures, children will endorse claims that entities like pointy rocks exist for a purpose (e.g., so animals can scratch their backs on them) over more scientifically-based claims that they arise from mechanical processes like erosion (Kelemen 1999, 2003; Kelemen and DiYanni 2005; Schachner et al. 2017; see also Kampourakis et al. 2012b). Despite this work on children’s reasoning about nature, most research concerned with the relation between teleological reasoning and natural selection understanding has focused on adults. Although some research has found evidence that elementary school-aged children express purpose-based ideas when reasoning about biological origins and evolution (e.g., Evans 2008; Emmons et al. 2018; Kampourakis et al. 2012a; Samarapungavan and Wiers 1997; Shtulman et al. 2016), to our knowledge, the present study constitutes the first detailed analysis of the nature of children’s teleological preconceptions and their role in young children’s intuitive ideas about trait evolution prior to instruction on adaptation. To explore this issue, we examined the relative frequency of explicit basic teleological versus elaborated teleological explanation at pretest to determine how often children’s teleological reasoning is overtly underpinned by incorrect causal mechanisms.

Third, we explored the possibility that teleological reasoning represents a distinctive challenge to young children’s learning about natural selection given the salience of such misunderstandings in adult research (Gregory 2009). Past research suggests that teleological reasoning presents a significant barrier to accurately understanding natural selection (e.g., Barnes et al. 2017; Gregory 2009; Kampourakis 2018; Kelemen 2012; Nehm 2018), but, once again, little of this work has examined this issue in children. One prior study explored how different narrative forms, including those with teleological language, influence children’s evolution concept learning (Legare et al. 2013). However, this study tested whether exposure to teleological language in a short narrative passage impacted learning of individual conceptual components of a selectionist explanation, rather than testing whether children who spontaneously self-generate teleological explanations are at a particular disadvantage for learning and applying the overall logic of natural selection. By contrast, in the current research, our focus was on whether children who produced explicit teleological misunderstandings at pretest were more or less likely to learn natural selection from a classroom storybook intervention than those who generated other more ambiguous kinds of preconceptions. We also broke teleological explanations into sub-categories to examine whether children with elaborated teleological misunderstandings learned less than children with basic teleological misunderstandings.



Participants were second and third graders drawn from a public school district in southwest Massachusetts. This school district included five elementary schools that served 2362 students. At the time of the study, the reported district demographics were as follows: approximately 69% white; 14% Hispanic; 6% African American; 5% Asian; 0.3% Native American; 0.3% Native Hawaiian or other Pacific Islander 0.3%; and 6% multiracial and non-Hispanic. Thirteen percent of the students in the district spoke a language other than English as their first language; 6% of the students in the school district had some sort of disability, and approximately 32% of the students in the district were considered economically disadvantaged. In the United States, the typical age range for children in second and third grade is 7 to 9 years of age.

Six teachers from four schools agreed to use the materials in their classrooms. Two hundred seventy-two students in 12 classrooms participated. However, 15 students did not complete either the pretest or the posttest assessment due to absences. Data from an additional 37 participants was excluded because the children skipped too many questions in either the pretest or the posttest to be coded. The final sample included 220 children. The majority of these participants were third graders (n = 182) and the remainder were second graders (n = 38).

Materials and procedure

All classroom activities—including the pretest, book reading, activity, and posttest—were completed at the teachers’ convenience over the course of 2 to 3 days.

Professional development

Prior to the study, two of the authors (SR and DK) led a brief 2- to 3-h professional development session for interested teachers and science coordinators from the district. Four of the six teachers who participated in the study attended this session along with the STEM coordinator for the school district (and an additional teacher who did not conduct the intervention). Two participant teachers who did not attend the professional development session were given background by those who did attend.

During the session, the teachers completed one of the assessment packets in order to assess their own knowledge and misconceptions about natural selection. After teachers completed the packets, teachers were offered an accurate explanation of natural selection and information about common misconceptions. Teachers were also introduced to the storybook and its use in a classroom. Finally, teachers were walked through the practical dynamics of the natural selection simulation activity. In addition to this professional development session, teachers had access to our Evolving Minds Project website where they could view additional materials, including the pointing guide that showed how to use informative gestures to help children follow along when reading the book (see

Pretest and posttest packets

Classroom teachers were asked not to help students answer any questions. This study differed from previous studies involving the storybook intervention in that students’ understanding of natural selection was assessed via paper-and-pencil worksheets rather than a structured talk aloud interview with a trained researcher. However, these paper-and-pencil packets were designed to be similar in structure to the explanation-eliciting interviews used in past studies (see Kelemen et al. 2014; Emmons et al. 2016 and 2018).

Two 12-page assessment packets were created for use as counterbalanced pre- and posttests. They presented adaptation scenarios with entirely parallel deep structure except that one assessment packet required participants to reason about a realistic but fictional group of cat-like mammals (“tardons”) that came to have longer tails over time while the other concerned realistic but fictional monkey-like mammals (“orpeds”) that came to have longer arms over time. In both cases, the change in the population was related to a change in the location of their food. In the “tardon” case, the melons that the animals ate started to grow only on the highest and least accessible tree branches. In the “orped” case, the fish that the animals ate started to swim only near the bottom of a deep river. Our past talk aloud studies have found that elementary school aged children show strong pre- to posttest learning with these counterbalanced food pressure scenarios when a storybook intervention is conducted by trained researchers (and when children also complete a storybook comprehension posttest not included in the present circumscribed intervention research). Samples of writing packets can be downloaded from the Evolving Minds project website at

On the first page of each packet, participants were introduced to the animal species (see Fig. 1). They saw a picture of the population many hundreds of years ago (e.g., orpeds with mostly shorter arms) plus a picture of its ancient environment (e.g., a beach by a river full of fish). Participants also saw a picture of the current population (e.g., orpeds with mostly long arms) and its current environment (e.g., a beach by a river with fish on the river floor). A brief description of the change in the environment was then provided (e.g., fish used to swim all over the river but now they swim near the bottom).

Fig. 1
figure 1

The first page of the assessment packet

Open-ended questions about natural selection:

After reading the description of the scenario, children were then prompted to write an explanation of why the species changed (e.g., why the orpeds went from mostly having shorter arms to longer arms) in at least five sentences.

On subsequent pages, participants saw the same two paired images of the past and present orped populations and their environments and read further prompts to explain what happened, first, to the orpeds with shorter arms (e.g., “what happened to the orpeds with shorter arms?” and “did anything else happen to the orpeds with longer arms?”). These open-ended prompts followed the protocol of talk-aloud interviews used in earlier studies where they were included to circumvent children’s tendencies to abbreviate their answers. However, in contrast to past studies—where this sequence of questions came at the end of the talk aloud protocol—these open-ended questions were the first items that participants encountered in the packet. This is because we anticipated that these questions would be the most time-consuming and tiring. We hoped that putting them first would give participants the best opportunity to answer them fully.

Close-ended isolated fact questions and justifications:

The following four pages included close-ended isolated fact questions with requests for justification. These included two questions about differential survival, two questions about differential reproduction, a question about inheritance of traits, and a question about trait constancy (see Table 3). For each question, children were asked to circle their answer and then to justify their answer.

Table 3 Closed-ended isolated fact questions with sample justifications
Environmental change integration questions:

There were an additional four forced-choice questions that participants answered about the relationship between the past or present environment and reproduction. These purely yes/no questions were exploratory and it later became clear that, in the absence of participants’ justifications, we could not differentiate between children who had a correct understanding of natural selection and children who had misconceptions. We therefore do not discuss them further.

Misconception recognition judgments:

As an addition to prior talk aloud interview protocols, on the last two pages of the packet, children saw three cartoon drawings of children. Each child had a speech bubble that contained a misconception about natural selection (see Table 4 for examples of these misconception prompts). Participants were told that the explanations could all be wrong, all be right, or some could be right and some could be wrong. In reality, all three explanations were incorrect. Children were asked to judge whether each explanation was right or wrong and to justify their answer.

Table 4 Examples of misconception prompts


After the administration of the pretest packets, teachers implemented the storybook intervention. The storybook, How the Piloses Evolved Skinny Noses, was designed to teach adaptation by natural selection to children as young as 5-years-old, and to directly challenge individual level teleological or intentional misunderstandings about adaptation. In consequence, the non-anthropomorphic pictures and language in the book carefully avoid any teleological or intentional connotations. The book follows a population of a realistic but fictional anteater species (piloses) before and after a major climate change. After the environment changes to become extremely hot, the piloses’ insect food moves from living above ground to living only in deep, thin underground tunnels. As a result, rare individuals in the population that have skinnier noses end up having a differential advantage in hunting for food which leads them to be healthy, live longer, and reproduce more than animals that have wider noses. Over multiple reproductive generations, individual with thinner trunks therefore come to predominate. The pattern of adaptation that is depicted therefore challenges heuristic assumptions that “bigger is better” or that traits inevitably increase rather than reduce during the process of evolution (see Nehm and Ha 2011, on older students’ difficulties reasoning about trait loss versus gain; see also Frejd 2019, for the importance of variation and death depictions in the book).

Research has found that the coherent, mechanistic explanation of adaptation that gradually unfolds in the book is an effective way to teach natural selection. In particular, 7- to 8-year-old children show marked capacities to both learn and apply the mechanism across generalization scenarios with various surface features (e.g., mammals, birds), selection pressures (e.g., food, predation) and trait changes (reductions and increases in size) (see Kelemen et al. 2014; Emmons et al. 2016 and 2018; see Brown and Kelemen 2020, for learning in adults).

Instead of the traditional storybook, one classroom (n = 19 students) viewed an animated video based on the storybook. In this minimally animated version, the storybook is read aloud, and as each page is presented, parts of the image that would be the focus of a teacher gesture in a live presentation of the print storybook are highlighted with some movement on the screen (e.g., as they are referenced, individual piloses shake slightly to draw children’s attention). Participants who received the animated storybook did not show statistically significant differences from those who received the traditional storybook, so we collapsed across book presentation method (but see Ronfard et al. 2020a, for research explicitly comparing children’s learning from print versus animated storybooks on adaptation and speciation).

Natural selection simulation activity

All teachers chose to perform a hands-on simulation activity after children had listened to the storybook and before the administration of the posttest. This act-out activity was designed to reinforce the ideas presented in the piloses storybook especially as, in contrast to prior storybook intervention research, children were not prompted to explain the book in a comprehension posttest once they had listened to it. Instead, in this more circumscribed intervention, their only formal posttest measured their capacities for transfer to a new species. In the simulation activity, students were each assigned an individual from the piloses population with either a wider trunk or a skinnier trunk. They then discussed whether their individual would be able to catch food, live a long life, and reproduce. They were then told—as in the story—that after the weather change, some piloses with wider trunks were able to have one child while others had no children, but piloses with skinnier trunks were healthy and had two children. Children acted out this differential reproductive success by selecting different numbers of offspring and subsequently repeating this process in two additional reproductive generations. In order to create an external visual model of the proportional trait change, for each generation, each class tallied and graphed how many piloses had skinny noses and how many had wide noses. Materials and instructions for conducting this activity can be found at

Data coding

Teachers mailed the completed assessments back to the researchers along with their notes on the implementation of the materials. The researchers then coded children’s responses using an established coding system (e.g., Kelemen et al. 2014; Emmons et al. 2018). As in previous studies, participants were assigned a global score based on their overall understanding of natural selection across all questions on each assessment. Table 5 overviews the coding system.

Table 5 Conceptual checklist for NS understanding and sample open-ended responses

This global coding system is based on participants’ answers and justifications to the close-ended isolated fact questions plus their responses to the open-ended explanation prompts. Participants’ justifications to the close-ended isolated fact questions were coded as accurate or inaccurate. In order to receive credit for these questions, participants had to answer not only the initial forced-choice question correctly but also provide an accurate response to the justification prompt (see Table 3). For the open-ended questions, participants’ responses were coded for their understanding of key concepts including, differential survival, differential reproduction, and multiple generations. In addition, participants’ responses to the close-ended fact questions and open-ended prompts were coded holistically for the presence of any misconceptions (see “Misconceptions” section).

Participants who answered fewer than five of the six close-ended isolated fact questions correctly were assigned to Level 1 (no understanding of natural selection (NS)) whether they demonstrated a misconception or not. Participants who answered at least five close-ended fact questions correctly but demonstrated a misconception or an inaccurate understanding of differential survival or reproduction were assigned to Level 2 (facts but no understanding of NS). Participants who answered at least 5 fact questions correctly and demonstrated an accurate understanding of differential survival were assigned to Level 3 (foundation for NS understanding). Participants who answered at least 5 fact questions correctly and demonstrated an understanding of differential survival and differential reproduction were assigned to Level 4 (NS understanding in one generation). Finally, participants who answered at least 5 fact questions correctly and demonstrated an accurate understanding of differential survival, differential reproduction, and multiple generations were assigned to Level 5 (NS understanding in multiple generations). Any participant who inaccurately described differential survival or reproduction in their open-ended response or who demonstrated a misconception at any point on the assessment could score no higher than a 2. Children who were assigned to Levels 3, 4, and 5 were considered to have a population-based understanding of natural selection, from basic (differential survival only; Level 3) to relatively sophisticated (differential survival and reproduction over multiple generations; Level 5).

One researcher served as the primary coder and coded the entire dataset. Four secondary coders each coded 25% of the dataset. Reliability between the primary coder and secondary coders was excellent (kappas ranged from 0.899 to 0.935).


Participants’ responses to the open-ended prompts and close-ended isolated fact questions were coded for several different kinds of misconceptions (Table 1). These misconceptions were not mutually exclusive; many participants demonstrated multiple misconceptions within the same assessment. For instance, a participant who expressed a basic teleological misconception in response to the open-ended prompt and a developmental misconception in response to a close-ended prompt would be coded as expressing both misconceptions. As noted above, we did not treat development and transformation misconceptions as explicitly teleological because these misconceptions did not include explicit reference to a goal, function, or need. However, we acknowledge these instances of transformation and developmental misconceptions may reveal implicit teleological reasoning. Because these transformation and developmental ideas may be construed as teleological, we label them as ambiguous rather than non-teleological.

Types of teleological reasoning

In addition to coding participants’ responses to the open-ended and forced-choice prompts for the clear presence of teleological misconceptions, we characterized the nature of that teleological reasoning. Our goal was to categorize participants’ explicit teleological responses as either basic teleological reasoning in which nothing more than a beneficial functional outcome—or need for a beneficial outcome—was offered as the explanation for the trait change (e.g., piloses got skinny noses so that they could reach food) or elaborated teleological reasoning—teleological reasoning that was additionally accompanied by evidence of additional inaccurate causal assumptions. Examples of elaborated teleological reasoning include the belief that goal-directed effort motivated a functional or need-oriented change (e.g., piloses stretched their noses so they could have skinny noses) or that an external agent caused the change (e.g., Nature/God gave piloses skinnier noses). Responses identifying God as an agent of biological change were rare but included in this category given that religious ideas do not fall within the domain of evidence-based scientific explanation.

At times, children succinctly expressed elaborated teleological misconceptions by combining a purpose- or need-based goal as well as a mechanism or agent of change within one sentence (e.g., “piloses with wider noses grew skinnier noses so that they could reach the bugs in the tunnels”). Other children tended to expand upon their ideas over the course of an assessment, adding new information in response to the series of prompts. To fully capture participants’ intuitions about natural selection and to avoid underestimating the number of children with elaborated teleological reasoning, we considered each assessment holistically. This allowed us to identify children who actively augmented basic teleological reasoning with additional causal mechanisms but who did so by mentioning a purpose-based rationale for the change in response to one question prompt and who described an inaccurate mechanism in response to a different prompt. Recall that we did not consider transformation and development misconceptions to be teleological unless the participant made explicit reference to function or need as the reason for the change. However, when these ideas co-occurred with basic teleological misconceptions, we considered that combination to be elaborated teleological reasoning. Thus, participants who expressed a basic teleological idea (piloses got skinnier noses so that they could reach the food) and an elaborated change mechanism (e.g., development: piloses got skinnier noses as they got older) on separate prompts within an assessment were coded as expressing elaborated teleological misconceptions along with participants who more straightforwardly expressed an elaborated teleological idea within a single written statement.

Note also that these two categories of teleological reasoning are mutually exclusive; a participant could only be coded as having either elaborated or basic teleological reasoning, and a code for elaborated teleological reasoning overrode a code for basic teleological reasoning. Thus, if a participant expressed an effort misconception in response to the open-ended questions and a basic teleological idea in response to a close-ended isolated fact question, that participant would be coded as expressing an elaborated teleological idea. Table 7 shows the outcome of this coding.


Do students learn from a teacher-led natural selection storybook intervention?

Children’s overall learning from the teacher-led classroom intervention was examined in three ways: First, we investigated the misconception recognition prompts and whether children were better able to recognize incorrect explanations of adaptation after the storybook intervention. Second, we tested whether children’s abilities to construct an accurate mechanistic explanation for adaptation by natural selection improved after the intervention. Third, we tested whether children were less likely to express a misconception about natural selection after the intervention.

Did the storybook intervention improve children’s recognition of incorrect explanations?

As a likely result of the fact that the misconception recognition items appeared on the last page of a lengthy writing packet, not all children answered these questions at both test points; analyses are restricted to the 112 participants who completed all three items at both pre- and posttest. On average, children accurately judged 1.58 (SD = 0.87) of three incorrect explanations as wrong at pretest and 2.37 (SD = 0.82) explanations as wrong at posttest. A repeated samples t test revealed that this was a significant improvement, t(111) = 8.32, p < 0.001. Follow-up repeated-measures McNemar analyses showed that children were significantly more likely to accurately judge each form of incorrect explanation as wrong (all ps < 0.01).

Did the storybook intervention help children to construct an accurate, generalizable theory of adaptation by natural selection?

Overall, participants struggled to construct accurate explanations for adaptation at pretest. At pretest, 85% of children (n = 186) were at Level 1 and displayed no accurate understanding of natural selection or its prerequisite facts. Only 2% of children (n = 4) expressed any level of population-based understanding of adaptation (Level 3 or higher; Fig. 2, left side). In contrast, by the posttest, only 32% of children remained in Level 1 and 53% of participants (n = 117) had abstracted and generalized a basic population-based understanding of natural selection to a novel case (Fig. 2, right side). Additionally, at pretest no participants scored in Levels 4 or 5, suggesting that no one accurately expressed a population-based explanation that included an understanding of differential reproduction. After the intervention, 61 participants (28%) accurately described differential survival and reproduction (Level 4) and a further 9 participants (4%) achieved the highest possible score (Level 5) for accurately describing differential survival and reproduction over multiple generations. A repeated measures ordinal logistic regression confirmed that participants scored higher at posttest generalization than at pretest, Wald χ2 (1, N = 440) = 180.14, p < .001 (see Fig. 2). According to guidelines, an odds ratio greater than 5 indicates a large effect size equivalent to Cohen’s d > 0.8 (Chen et al. 2010). The odds ratio of performing better at posttest than at pretest was 15.01, 95% CI [10.11, 22.29], indicating a large effect of the storybook intervention.

Fig. 2
figure 2

Participants’ performance on pretest and posttest generalization assessments. Because of rounding, percentages do not always add up to 100. Level 1 = no isolated facts; Level 2 = isolated facts but no natural selection understanding; Level 3 = foundation for natural selection understanding; Level 4 = natural selection understanding in one generation; Level 5 = natural selection understanding for multiple generations

To better understand the kinds of learning that children experienced, we examined individual children’s shifts in their global scores. Inspection of Fig. 3 reveals that 66% of children in Level 1 at pretest improved, with 50% achieving a population-based understanding of natural selection at posttest. Approximately two-thirds (67%) of children in Level 2 at pretest achieved a population-based understanding at posttest. The remaining third either stayed at Level 2 (10%) or regressed to Level 1 (23%). All children who were at Level 3 at pretest maintained a population-based understanding at posttest, with 50% progressing to Level 4 or Level 5, demonstrating a more advanced understanding.

Fig. 3
figure 3

Patterns of student learning as evidenced by the percentage of participants who changed their global level of understanding from pretest to posttest. Level 1 = no isolated facts; Level 2 = isolated facts but no natural selection understanding; Level 3 = foundation for natural selection understanding; Level 4 = natural selection understanding in one generation; Level 5 = natural selection understanding for multiple generations

Did the intervention reduce children’s general tendency to express misconceptions?

Misconceptions that were either explicitly teleological or ambiguous were very common at pretest, with 85% of children (188 individuals) demonstrating at least one clearly identifiable form of misunderstanding prior to instruction. Although explicitly teleological and ambiguous misunderstandings often co-occurred within each children’s written assessment, a greater percentage of children stated ambiguous misconceptions (transformation and development) than explicit teleological misconceptions both before and after the intervention, replicating patterns found in Emmons et al. (2018).

In contrast, only 23% of children (50 individuals) displayed any kind of misconception at posttest. A related-samples McNemar’s test revealed that participants were less likely to demonstrate a misconception at posttest than at pretest, \(\chi^{2}\) (1, N = 220) = 132.18, p < 0.001, OR = 0.014. Inspection of Table 6 shows that all categories of misconceptions were less frequent after the intervention.

Table 6 Percentage of participants (n = 220) who stated particular misconceptions at pretest and posttest

Interim summary

In sum, the teacher-led, classroom-based storybook intervention had a positive effect on all measures of children’s overall understanding of natural selection. Compared to pretest, children at posttest were more capable of recognizing an inaccurate explanation, more likely to generate and apply an accurate explanation of natural selection to a new case, and less likely to demonstrate any kind of misconception about adaptation.

Children’s teleological reasoning

The second goal of this study was to better understand the nature of children’s teleological reasoning about biological trait change. First, we examined the degree to which children’s pretest teleological reasoning was elaborated or basic prior to instruction and after instruction. Next, we examined individual differences, specifically, we tested whether children who presented teleological misconceptions at pretest differed from others in their factual biological knowledge and level of written verbal expressiveness to understand whether these individual differences might help explain differences in the degree to which children held teleological misunderstandings and explicitly conveyed these ideas in writing.

How frequent were basic versus elaborated teleological preconceptions among those children expressing a misunderstanding at pretest and posttest?

As Table 6 shows, 85% of children (n = 188) stated a misconception at pretest but only 23% stated one at posttest. Table 7 shows the pattern of data when children were coded into mutually exclusive categories as being either explicitly teleological (basic or elaborated) or ambiguous in their misunderstanding at each assessment. As described in the method, a child who stated an ambiguous explanation (e.g., transformation) and also an explicit basic teleological misconception within one assessment was categorized as having an elaborated teleological misunderstanding given that their overall logic combined an inaccurate purpose-driven assumption with an inaccurate mechanistic idea about biological change (see “Method” section).

Table 7 Percentage of participants who expressed explicitly teleological and ambiguous misconceptions

Table 7 confirms that children were more frequently ambiguous in their misconceptions at pretest than explicitly teleological. Nevertheless, explicit teleological explanations were still common. Approximately a third (n = 61, 32%) of the 188 children who stated a misunderstanding at pretest offered a teleological misconception. Among those with a teleological misconception, however, basic teleological reasoning was rare: only 18% expressed these unelaborated purpose-based misunderstandings. Instead, the majority of these children (82%) amplified their purpose-based reasoning with an inaccurate mechanism at some point in their assessment and were therefore categorized as having an elaborated teleological misunderstanding.Footnote 4

As with ambiguous misunderstandings, fewer children displayed basic and elaborated teleological reasoning after the intervention. While 28% of children displayed any kind of explicit teleological reasoning at the pretest, this dropped to 7% (n = 15) by the posttest. As at pretest, participants who employed teleological reasoning at posttest tended to use elaborated teleological reasoning. Only four participants employed basic teleological reasoning (2% of the sample, and 27% of the participants with TE reasoning).

Are teleological preconceptions associated with individual differences in expressive language and biological factual knowledge at pretest?

To better understand why some participants expressed explicit teleological misunderstandings at pretest and others did not, we explored whether children who stated teleological versus ambiguous preconceptions at pretest differed on individual difference measures. Given that explicit teleological reasoning reflects a fundamental misunderstanding of natural selection as a goal-directed event, we examined the possibility that children who stated teleological ideas knew fewer biological facts than other participants, reflecting a greater lack of general biological knowledge. In addition, given the argument that ambiguous misconceptions may reflect tacit teleological ideas expressed in a shortened form that omits the typical linguistic markers of teleology, we examined whether participants who expressed explicit teleological misunderstandings at pretest demonstrated higher or lower (written) expressive ability than those who demonstrated ambiguous misunderstandings. Biological factual knowledge was measured by counting the number of isolated fact questions children answered correctly. Expressive language was measured by counting the number of words that children used in their answers to the initial open-ended question and its follow-up prompts (e.g., “what happened to the orpeds with longer arms?”). The analysis focused on those 188 participants who demonstrated at least one misconception, either some kind of teleological misconception (n = 61) or an ambiguous misconception (n = 127). Linear regression revealed that participants who expressed only ambiguous misconceptions were equally as expressive (M = 98.35, SD = 50.85), as children with teleological misconceptions (M = 106.62, SD = 50.12), b = 8.28, F(1, 186) = 1.10, p = 0.295. However, a further linear regression revealed that children with teleological misconceptions answered more fact-based questions correctly (M = 3.20, SD = 1.74) than those who had ambiguous misconceptions (M = 1.72, SD = 1.65), b = 1.42, F(1, 185) = 29.71, p < 0.001. In fact, as Table 8 shows, children with teleological reasoning were more accurate for each of the six individual fact questions (Table 8).Footnote 5

Table 8 The relation between pretest teleological explanation and knowledge of individual adaptation facts at pretest

Given the small number of participants who expressed basic teleological misconceptions at pretest, we compared participants with basic and elaborated teleological reasoning qualitatively on measures of expressiveness and biological factual knowledge. Overall, participants who expressed basic teleological ideas at pretest looked very similar to those who expressed elaborated teleological ideas at pretest on expressiveness (M = 111.55, SD = 44.40 and M = 105.54, SD = 51.64, respectively). Participants who expressed basic teleological ideas had slightly higher biological knowledge (M = 4.09, SD = 1.51) than those who expressed elaborated teleological misconceptions (M = 3.00, SD = 1.74). These same general individual difference patterns were observed at posttest except that children who had explicitly teleological misunderstandings no longer differed in biological factual knowledge from those with ambiguous misunderstandings.

Did teleological preconceptions impact children’s learning of natural selection more than ambiguous preconceptions?

A third goal of this study was to determine whether pretest teleological reasoning—especially elaborated teleological preconceptions—had a particularly strong impact on children’s ability to learn from the storybook intervention. We first assessed whether the presence of any misconception at pretest predicted whether participants would demonstrate a population-based understanding of natural selection (Level 3 or higher) at posttest. Logistic regression controlling for pretest factual knowledge revealed that children with misconceptions at pretest were no more or less likely to demonstrate a population-based understanding of natural selection at posttest than those who did not demonstrate any misconceptions at pretest, b = 0.64, F(1, 217) = 2.12, p = 0.145.

Next, we assessed the impact of explicitly teleological misconceptions on learning compared to other forms of misconceptions. We used logistic regression to predict the likelihood of demonstrating a population-based understanding of natural selection from pretest misconception category: either explicit teleological misconception or not. Because the expression of explicitly teleological ideas was associated with higher biological factual knowledge, we again controlled for participants’ pretest fact scores in these analyses. Analyses were restricted to children who demonstrated at least one misconception at pretest. This analysis indicated that pretest teleological reasoning was neither negatively or positively predictive of participants tendency to demonstrate some level of accurate population-based understanding of adaptation at posttest, b = 0.30, F(1, 185) = 0.75, p = 0.387. That is, despite the fact that they were predominantly causally elaborated, teleological preconceptions were no more likely to help or hinder accurate mechanistic learning of natural selection than ambiguous misconceptions. Unsurprisingly, pretest factual knowledge did predict posttest natural selection understanding, b = 0.32, F(1, 185) = 11.54, p < 0.001. We also examined the effect of teleological reasoning for those participants who explicitly described an incorrect mechanistic explanation of natural selection at pretest. Logistic regression compared the likelihood of demonstrating a population-based understanding of natural selection at posttest for participants who demonstrated either an ambiguous misconception (i.e., a mechanistic explanation but with no explicit teleological reasoning; n = 127) and for participants who demonstrated an elaborated teleological misconception (i.e., a mechanistic explanation with explicit teleological reasoning; n = 50). Again, there was no effect of pretest teleology, b = 0.27, F(1, 174) = 0.53, p = 0.468.

We also investigated the effect of pretest teleological reasoning on expression of any misconception at posttest and ability to recognize misconceptions from the misconception prompts at posttest. In both cases, we controlled for pretest fact knowledge and restricted our analyses to participants with some form of misconception at pretest. Logistic regression revealed no effect of pretest teleological reasoning on the expression of misconceptions, b = −0.33, F(1, 185) = 0.61, p = 0.434, although pretest biological factual knowledge did negatively predict the expression of misconceptions at posttest, b = − 0.21, F(1, 185) = 3.89, p = 0.049. A further linear regression revealed no effect of teleological reasoning on the ability recognize misconceptions, b = 0.26, F(1, 185) = 2.66, p = 0.104.

As noted, only 5% of children (n = 11) expressed a basic teleological misconception at pretest. Given an absence of power, we therefore conducted qualitative analyses to explore the prediction that children with an elaborated teleological preconception at pretest might have a more difficult time learning and expressing an understanding of natural selection than those with an unelaborated basic teleological preconception. General patterns were consistent with this prediction. Although 73% of participants who expressed an explicit basic teleological understanding at pretest displayed some level of a population-based understanding of natural selection at posttest, this was only true of 60% of children who displayed an explicit elaborated teleological idea. Similarly, 20% of participants with an elaborated teleological misconception at pretest had some kind of misconception at posttest, whereas only 9% of participants who had a basic teleological misconception at pretest had a misconception at posttest, and participants with basic teleological misconceptions at pretest were able to identify more misconceptions at posttest (M = 2.91, SD = 0.30) than those with elaborated teleological misconceptions (M = 2.28, SD = 0.81).


Findings from the current school-based study extend prior research on scripted and controlled researcher-led interventions. They reveal that, after participating in a teacher-led storybook intervention, early elementary students in public school classrooms demonstrate substantial learning of natural selection. Children were not only better able to recognize inaccurate individual-based accounts of evolutionary change, they were also increasingly able to generate basic selectionist explanations of adaptation with their reasoning revealing reduced intrusion from various misconceptions that were strongly evident at pretest.

It is notable that students demonstrated these abilities given the minimal nature of the professional development teachers received. Most adults hold misconceptions about natural selection (e.g., Brown and Kelemen 2020), and teachers are no exception (e.g., Nehm et al. 2009; Rachmatullah et al. 2018). Therefore, successful teacher-led interventions might require professional development that provides teachers with information about natural selection as well as information about common misconceptions that students may exhibit. Although the professional development in the current study was brief, students still benefitted greatly from the intervention as a whole. Despite this positive outcome, further research is in progress to determine best practices for professional development around teaching natural selection.

It is also notable that children showed such marked improvements despite the highly circumscribed structure of the teacher-led classroom intervention and the challenge of generating written explanations rather than engaging in a talk aloud interview. Although some of these dynamics meant that the current learning outcomes were not as marked as in prior researcher-led storybook interventions (Kelemen et al. 2014; Emmons et al. 2016 and 2018), the learning effects were still strong. These results therefore converge with prior work to suggest that laying the foundation for a relatively comprehensive causal-explanatory understanding of evolutionary process is eminently achievable in elementary school. They underscore that young children are able to learn more than the disparate or limited facts about evolution that are commonly identified as learning targets in elementary science standards (see ACARA 2017; Achieve, Inc. 2013; National Curriculum for England 2014). Rather than being capable of learning only concrete or isolated facts–and consistent with a body of research that indicates they are abstract domain-specific theory-builders (e.g., Gelman 2013; Gopnik and Wellman 2012)—children are able to construct and apply a basic but accurate understanding of evolutionary mechanism (Kelemen 2012). The current results therefore add to a growing evidence base that systematic causal-mechanistic teaching of one of the most counterintuitive but cornerstone ideas in the life sciences can and should commence in early elementary school (e.g., Campos and Sá Pinto 2013; Kelemen 2012 and 2019; Sá-Pinto et al. 2017; also, Nadelson et al. 2009). Further motivation for this proposal derives from recent findings that elementary children who successfully construct an understanding of adaptation by natural selection are more likely to also construct an accurate understanding of even more challenging larger-scale evolutionary concepts such as speciation and common descent (Ronfard et al. 2020a). In consequence, teaching natural selection in elementary school can lay a robust, and potentially enduring, foundation for the development of broader evolutionary literacy that, in a spiraling progression, would also aim in later grades towards incorporating other evolutionary processes (e.g., genetic drift).

The present findings also add to our understanding of children’s preconceptions about biological change and underscore how counterintuitive adaptation by natural selection is even for early elementary students. Specifically, 85% of our participants demonstrated intuitive misunderstandings about adaptation at pretest. While the majority of these misunderstandings were ambiguous with respect to teleological content–children identified transformation or development as a source of trait change without overt reference to a functional outcome–a third of children offered ideas that were explicitly teleological. Rather than being basic, most of these were elaborated by inaccurate causal mechanisms that, arguably, have psychological overtones (effort- or agent-based change). In consequence, their teleological misunderstandings took a form that, in adults and older students, has often been found or assumed to represent a particularly robust barrier to developing an accurate understanding of evolutionary mechanism (e.g., Barnes et al. 2017; Gregory 2009; Kampourakis 2018; Kelemen 2012; Nehm 2018).Footnote 6 In contrast to adult patterns, however, our findings indicated that explicit teleological preconceptions–despite predominantly being elaborated ones–inhibited children’s learning of natural selection no more than ambiguous preconceptions. This finding somewhat aligns with suggestions from prior research that basic need-based teleological language may not be an excessive hindrance to young children’s learning—at least in relation to acquiring individual evolutionary concepts (Legare et al. 2013).

This interesting result therefore raises questions about the effects of teleological reasoning on learning with age and development. If teleological intuitions represent no special impediment to young children’s construction of a selection-based understanding of adaptation—relative to ambiguous transformational and developmental misconceptions—when, and in what contexts, do such ideas become a particular challenge to older students? One possibility is that with increased age, and additional formal and informal education on biology, children become more confident in their knowledge. Children’s increased confidence may be extended to their understanding of natural selection, often underpinned by teleological intuitions, thus further entrenching these incorrect ideas and making them more resistant to change. Unfortunately, however, an important prerequisite to exploring the relative impact of explicitly teleological reasoning on biological learning throughout development, in part, rests on first drawing a conclusion as to the extent to which explanations coded as ambiguous truly conceptually differ from those coded as explicitly teleological (see Table 1). That is, while ambiguous developmental and transformationist explanations had no explicit linguistic markers of teleological content, some might argue that they still involved implicit purpose-based assumptions. One reason for this is that, consistent with an Aristotelian view, it may not be possible to conceive of development as anything other than an intrinsically teleological process: it is, after all, directed towards fulfillment of a goal state (i.e. maturity). However, another more mundane reason for viewing the ambiguous explanations as implicitly teleological is that the context in which children invoked these transformational and developmental changes always involved assessments in which children were explaining a change towards a beneficial functional outcome. As such, children may have felt no need to explicitly mark that “orpeds changed/grew to having longer arms (to reach their food)” because in context of the assessment materials (see Fig. 1), it may have been communicatively pragmatic to assume it was obvious that the change was goal-directed towards a beneficial outcome.

One counterargument to this proposal that the ambiguous explanations simply reduce to teleological explanations (such that it’s unsurprising that learning outcomes did not differ between the groups) is that there was a subtle difference between children who generated ambiguous versus explicit teleological explanations. Specifically, at pretest, children who stated explicitly teleological explanations displayed more biological factual knowledge than children who offered ambiguous explanations. In consequence, explicit teleological explanations seem to be the rational, inferential product of more biologically informed children who are actively theory-building–an orientation that could certainly end up neutralizing any learning advantage that children generating non-teleological ambiguous explanations might otherwise have had. Such a conjecture is, of course, highly speculative. Indeed, to more firmly resolve whether ambiguous explanations are really conceptually distinct from explicit teleological explanation requires additional follow-up research–studies involving a range of learning assessments that probe an even wider range of contexts (Nehm 2018) and potentially ask children to predict (Shtulman 2006; Sá Pinto et al. 2013) as well as explain biological change outcomes (although see Gould 1990, for concerns about the scientific appropriateness of predicting evolution).


In closing, the present findings provide further evidence of the viability and effectiveness of coherent, comprehensive education on evolutionary mechanisms in elementary school. They also shed light on the prevalence and impact of explicit teleological preconceptions on children’s learning of natural selection, revealing that while they are frequent, young students are surprisingly good at overcoming them even after a circumscribed intervention. Further research will, however, need to examine the longer-term learning outcomes from this kind of teacher-led intervention. While prior studies have found that a researcher-led storybook intervention promotes a generalizable understanding of adaptation by natural selection for at least 3 months, it is unclear whether the same minimum longevity might hold true for the classroom learning documented here. Answering such a question is not simply a practical prerequisite for developing an effectively spaced learning progression on evolution in elementary school. It is also relevant to answering theoretical questions about the very nature of conceptual change, especially in light of the theoretical assumptions about dual processing and explanatory co-existence that guide the current work (e.g., Kelemen 2004 and 2019; Dunbar et al. 2007; Evans et al. 2011; Shtulman 2017; Zaitchik and Solomon 2009).

Specifically, it is assumed here that when children construct a theory of natural selection, their scientific learning serves to suppress rather than replace prior intuitively-based ideas, especially teleological ideas that reliably emerge in children’s reasoning about diverse natural phenomena across cultures and from early in development (e.g., Kelemen and DiYanni 2005; Schachner et al. 2017). Intuitive tendencies like these may remain as explanatory defaults that compete with counterintuitive scientific learning such that even when science learning occurs—as in the current research—it quickly reverts back to ideas rooted in more automatic explanatory predilections unless it is repeatedly reinforced and built upon (see Ronfard et al. 2020b; Shtulman et al. 2016). Cross-cultural studies of children’s enduring learning of counterintuitive ideas over extended time are therefore crucial to understanding how much children default back to their own prior conceptions and what factors affect such defaulting. This, in turn, can inform our understanding of the basic processes of conceptual development. Clearly, such studies are also relevant to designing evidence-based educational interventions that successfully build enduring scientific literacy from early in development. This is an increasingly pressing goal to pursue in the current evolutionary context of rapid climatic and environmental change.