Teaching and Learning About Evolution in Introductory Biology

Evolutionary science is one of the core explanatory frameworks for all modern biology (Dobzhansky 1973) and a particularly challenging subject for teaching and learning. A number of reports and studies have documented major obstacles undergraduate students encounter in learning about evolution: prior knowledge and misconceptions (Bishop and Anderson 1990; Jensen and Finley 1996; Alters and Nelson 2002; Anderson et al. 2002; Nehm and Reilly 2007), sociocultural and religious beliefs (Hokayem and BouJaoude 2008), and not least, a general lack of understanding of the nature of science (Lederman 1992; National Academy of Science 1998; Dagher and BouJaoude 2005). Furthermore, because evolution encompasses multiple scales of time and biological organization, it is difficult to directly observe and manipulate, especially in the context of a classroom or laboratory experience.

Teaching evolution effectively needs more than subject-matter knowledge. Awareness of the conceptual challenges described above and of the pedagogical practices that best support science learning are fundamentally important for designing effective evolution instruction. Not surprisingly, teaching and learning about evolution at the college level remains a prime subject of scholarly interest and discussion (Pennock 2005; Nelson 2008; Timmerman et al. 2008). Existing literature on science teaching and learning and numerous calls for reforming science education clearly indicate what fundamental pedagogies are most successful in promoting learning (Wyckoff 2001; National Research Council 2003, 2005; Smith et al. 2005). Experts advocate teaching science as a way of knowing through instructional strategies that actively engage students in their learning, uncover prior knowledge and conceptions, promote inquiry, and address both content and the nature of science. It follows from these guidelines that relevant and effective biology education should strive to reflect the current ways in which scientists think, practice, interact, and communicate about biology (National Research Council 2003, 2008; Handelsman et al. 2004; Ebert-May and Hodder 2008).

A major challenge in teaching and learning about all contemporary biology, including evolution, is that twenty-first century biology is a vast, complex, and deeply interconnected discipline that presents the conceptual challenges typified by complex systems (Songer and Mintzes 1994; Nunez and Banet 1997; National Research Council 2008; Verhoeff et al. 2008). Complex systems are broadly characterized by unifying themes: (a) they are composed of subunits or parts interconnected by networks of dynamic interactions and feedback loops, (b) they express emergent properties that result from interactions among their parts, and (c) their properties cannot be predicted or understood by studying the parts in isolation from the system (Bar-Yam 2003). Achieving deep conceptual understanding of evolutionary science requires thinking about it from a complex systems perspective. To understand evolution, learners not only need to know and understand key facts and concepts at multiple scales of biological organization (the “parts” of the system), but also must be able to explain evolutionary outcomes by appropriately connecting these concepts across scales of time and complexity (Fig. 1a). Research on teaching and learning about complex systems is gaining great momentum within the learning sciences (Hmelo-Silver and Azevedo 2006; Jacobson and Wilensky 2006) as scientists and educators become increasingly aware of the importance of systems thinking skills (Goldstone and Wilensky 2008; Verhoeff et al. 2008; Evagorou et al. 2009).

Fig. 1
figure 1

a Evolution by natural selection involves events at multiple scales of biological organization (genes, organisms, populations). Conceptual understanding of the connections among these scales is critical for understanding evolution. b Our design for the assessment of students' knowledge and understanding of the key concepts of evolution by natural selection

Reforming Instruction on Evolution

We conducted this study in the context of a large-enrollment introductory biology course for life science majors at a large research university. This course is currently the subject of a comprehensive reform designed around the principles of scientific teaching (Handelsman et al. 2004, 2006) and backward instructional design (Wiggins and McTighe 1998). Reformed course sections are characterized by student-centered learning environments in which students are actively engaged in learning biology through activities and assessments that reflect scientific practice. Students work in cooperative learning groups to solve problems, discuss concepts and questions, construct and test hypotheses, and evaluate scientific data and models. The 15-week course focuses on principles of genetics, evolution, and ecology and consists of two weekly 80-minute class meetings and one weekly three-hour laboratory.

The teaching and learning activity described in the “Methods” section of this study represents one example of a backward instructional design approach to developing instruction on evolution. The fundamental principles of backward design include: (a) establishment of learning objectives followed by the design of activities and assessments that are aligned with the objectives, (b) analysis of assessment data in the context of the learning objectives, and (c) use of the instructor's reflections and assessment data to guide subsequent modifications of the instructional design (Wiggins and McTighe 1998).

We designed and implemented a classroom activity using Avida-ED (Pennock 2007a), an instructional tool that aligns with the broader goals of our course, namely, that students:

  1. 1.

    actively learn about biology and the nature of science through inquiry;

  2. 2.

    build meaningful connections among concepts across scales of time and biological organization;

  3. 3.

    apply general concepts and theoretical frameworks to solve problems and to explain how biological systems work.


Avida-ED ( is a digital evolution software environment designed for teaching and learning about evolution and the nature of science in biology courses (Pennock 2007a). Avida-ED is built upon Avida, a well-established digital evolution model system widely used for research on evolution (Lenski et al. 2003; Hang et al. 2007; Ofria et al. 2008; Yedid et al. 2008). Use of model organisms and systems is common practice in basic biology research because it generates knowledge that is applicable to other, more complex organisms and systems that are impossible or impractical to study (Fields and Johnston 2005). Research on digital life in the Avida system does not inform us about the distinctive biology of any particular organism but instead is used to address questions about general evolutionary principles and to test hypotheses about evolutionary processes (Wilke and Adami 2002; Ofria and Wilke 2004). In the same way, Avida-ED is a model system for observing evolution in action in the laboratory and classroom. Importantly, Avida and Avida-ED are not simulations but are both actual instances of the evolutionary mechanism (Pennock 2007b).

As a learning tool, Avida-ED lends itself well to a variety of instructional strategies and supports the pursuit of multiple learning goals, as it promotes:

  1. (a)

    Conceptual understanding of evolution by natural selection. The fundamental principles underlying evolution by natural selection are instantiated in the Avida-ED model. Digital organisms are essentially small programs, much like self-replicating computer viruses. In the Avida-ED environment, “Avidians” are functionally comparable to bacteria in a petri dish. They populate a virtual petri dish containing digital “resources” that they could potentially metabolize; they reproduce asexually by fission; their “haploid genome” is replicated and each descendent inherits a copy of the parental genome. Replication is imperfect; therefore, genomes may mutate randomly, and mutations may result in a variety of phenotypes, such as the ability to metabolize different resources. Phenotypic variation may result in fitness differences among individuals: the differential ability to metabolize resources affects Avidians' relative survival and reproductive success. Ultimately, Avidian populations evolve in real time before the user's eyes.

  2. (b)

    Learning through inquiry. In Avida-ED, students have the ability to ask original, open-ended questions about evolution, set up experiments to test their hypotheses, and check their predictions against what they can observe in the evolving populations. Extremely short generation times and a rich dataset associated with each trial afford students the opportunity to obtain results quickly, replicate experiments multiple times, and refine their experimental questions and designs. Derived from a true research tool, Avida-ED allows learners to conduct authentic inquiry and thus belongs to a novel category, distinct from that of computer-based simulations (Pennock 2007a).

  3. (c)

    Learning about the nature of science. Posing and testing hypotheses are central activities of science, and science educators have long advocated the importance of providing students with opportunities to engage in cognitive activities similar to those of scientists. Science is often portrayed as a strictly empirical exercise, but the work of historians and philosophers of science has prominently identified conceptual problems as an integral part of past and present scientific practice (Stewart and Rudolph 2001). Avida-ED presents learners with the opportunity to address both conceptual and empirical problems and to practice connecting theory and evidence.

  4. (d)

    Complex systems thinking. Avida-ED offers learners the opportunity to experience the dynamics of a complex system. It provides a dynamic model of evolution by natural selection that can be manipulated, observed, and used in many ways. One of the major limitations in understanding complex systems in biology is that they comprise multiple scales of biological organization (Wilensky and Resnick 1999; Hmelo-Silver and Azevedo 2006). Generally, only one or a few of these scales can be directly experienced and observed at a time. Evolution by natural selection, for example, happens in populations as a result of the interaction between individual organisms and their environment. Individuals within a population differ from one another due to random changes in their genes. Understanding phenotypic variation within a population thus requires establishing a conceptual connection between the molecular, organismal, and population scales (Fig. 1a). Changes that occur at the population and molecular scales are difficult to grasp because they are difficult to directly observe and require both spatial and temporal perspectives. Avida-ED allows users to see both “the forest and the trees,” as users can easily shift between views that represent outcomes at different scales, ranging from whole populations to changes in the code of individual organisms.

  5. (e)

    Transferring conceptual knowledge across different cases of evolution. Experts are able to discern underlying patterns (unifying principles or theories) in different instances of the same phenomenon. Furthermore, experts categorize seemingly very different problems on the basis of their “deep” conceptual features, rather than on the surface features that typically capture novices' attention (Larkin et al. 1980; Chi et al. 1981; Kozma and Russell 1997). The expert ability to transfer conceptual understanding and to apply theories and principles to explain why and how natural events happen is a highly desirable outcome in higher education. Novice learners can achieve deeper understanding of abstract concepts and theories through practice with multiple representations of the same concept or phenomenon (Ainsworth 1999). In doing so, learners move from the surface features of representations to the deeper conceptual structures and principles. The Avida-ED platform provides a dynamic and interactive alternative representation of evolution that, in conjunction with other exemplars and cases, affords students more opportunities to test and solidify their understanding of the principles of evolution by natural selection.


Avida-ED Activity Design

We designed and tested an in-class activity that used Avida-ED as a model for studying evolution by natural selection. This activity was part of the instruction on evolution in one section of an introductory biology course (enrollment = 194 students) for life science majors, described in the “Introduction” section. The evolution unit comprised approximately four weeks of the course and followed instruction on genetics (DNA structure and function, cell division, and inheritance). We formulated several broad learning outcomes for our course (see the “Introduction” section) and more specific learning objectives for each unit (genetics, ecology, evolution). Every class meeting and learning activity was designed with one or more of these objectives in mind. We selected existing, or designed new, assessment instruments based on their potential for providing evidence of achievement of the desired learning outcomes.

The specific objectives for the Avida-ED activity directly aligned with one of our broader evolution learning goals (Table 1). The instructor implemented the Avida-ED activity during one class meeting toward the end of the evolution unit after students had practiced their thinking about evolutionary concepts by working with other examples, models, and systems. Students prepared for this activity by completing the following as homework:

  1. a.

    Downloaded the Avida-ED software onto their laptops (students regularly brought laptops to class);

  2. b.

    Read an article about Avida that appeared in the popular press (Testing Darwin by Carl Zimmer in Discover Magazine, February 2005);

  3. c.

    Reviewed the Avida-ED user manual to become familiar with the appearance of the interface and specific terminology (e.g., Avidians).

Table 1 Learning objectives for the Avida-ED activity were designed in the context of one of our broader course learning goals

At the beginning of class time, the instructor spent approximately 15–20 minutes projecting the Avida-ED interface and running a sample trial. During this time, the instructor provided an overview of the interface, directed students to the places in the software where they would find tools and data relevant to the activity, and answered students' questions (mostly about the organization of the software and the conceptual interpretation of the Avida model). Students used a worksheet (available for download at including the activity objectives, instructions, blank tables for recording data, and questions about interpretation of their results. Students worked on the activity in their cooperative groups in class and completed their group assignment as homework.

The activity was structured into two parts, each addressing different implicit objectives derived from what the instructor had observed as problematic in students' understanding of evolution by natural selection (Table 1):

  • Part I. Objective: to observe the random occurrence of mutations. Students observed the evolution of a population in Avida-ED under certain fixed conditions. They repeated this experiment at least twice, recorded and plotted data in charts, then compared the outcome of their independent replicates. This protocol gave students the opportunity to witness the random appearance of different traits and subsequent change in their frequencies in a population during different trials.

  • Part II. Objective: to observe and conclude that fitness is relative to the environment. Students observed the effect of changing the resource availability on different Avidians' reproductive success. They used a mutant (isolated from a previous trial) that was capable of metabolizing a given resource and compared the reproductive success of that mutant strain to that of a “wild-type” strain. To do so, students grew the two strains concurrently, both in the presence and in the absence of the specific resource. This part of the activity gave students the opportunity to observe that ability to use a specific resource does not confer a selective advantage (i.e., does not increase fitness) when that resource is absent from the environment. Also, students could observe that new mutations enabling Avidians to use certain resources may occur regardless of whether those resources were present in the environment.

Assessment of Student Learning About Evolution

We focused assessment of student learning on five key principles of evolution by natural selection (Anderson et al. 2002; Nehm and Schonfeld 2008). The principles below represent the core theoretical underpinnings of all the examples discussed in class, including Avida-ED:

  1. 1.

    Variation: there is phenotypic variation between individuals of a population.

  2. 2.

    Origin of variation: phenotypic variation among individuals has a genetic basis (is the result of random mutations in genes).

  3. 3.

    Inheritance: genetically determined traits are inherited by offspring.

  4. 4.

    Fitness: individuals in a population have different survival and reproductive success, based on the environment acting on their heritable phenotypic traits.

  5. 5.

    Evolutionary change in populations: the frequencies of phenotypes (and ultimately of the alleles responsible for these phenotypes) change in populations over time.

Understanding each of the principles above requires connecting concepts across different scales of biological organization in the context of the environment (Fig. 1a).

We used three different assessment tools (Fig. 1b), each designed to detect different facets of understanding of the five key principles above: the Conceptual Inventory of Natural Selection (CINS; Anderson et al. 2002) and two open-response instruments, which we refer to as the “Dino Problem” and the “Concept Frame.”

The CINS is a selected-response instrument designed to measure knowledge of ten fundamental principles of evolution by natural selection (Anderson et al. 2002). Although originally designed for use with nonscience majors, the CINS has been shown to be valid and reliable for science majors as well (Nehm and Schonfeld 2008). The instrument consists of 20 multiple-choice questions (two questions for each of the ten key concepts). For our analysis, we selected ten CINS items that address the five key principles of evolution described above (CINS questions 4, 6, 7, 9, 10, 13, 16, 17, 18, and 19). All mentions of the CINS in the context of this study refer to this ten-item subset.

The “Concept Frame” (Fig. 2) is a constructed-response tool we designed for assessing students' ability to apply the five key principles to two different cases of evolution by natural selection they had worked with in class: the evolution of various metabolic abilities in populations of digital organisms in Avida-ED and the evolution of corolla length in wild tobacco populations growing in different environments. The instrument is a grid in which column headers represent the specific cases and row headers represent the five key principles. Students were asked to complete the empty cells with brief explanations of how each principle applied to each example.

Fig. 2
figure 2

Concept Frame. The instructions given to students were: “Two different examples (models) of evolution by natural selection, which were discussed in the classroom, are represented in the following table. Explain how each of the five concepts listed on the left applies to each of the models”

The “Dino Problem” (Fig. 3) is a constructed-response instrument that presents students with a hypothetical scenario (depicted in a cartoon) and asks them to articulate an explanation for populations' change over time based on their understanding of evolution by natural selection. In the cartoon, dinosaur-looking animals and plants are represented at three successive points in time as they become progressively taller over time. Students' explanations provide evidence of their ability to recall, apply, and connect multiple principles of evolution by natural selection as they construct a scientific explanation about a novel problem. The cognitive abilities addressed by each instrument (Fig. 1b) are hierarchically nested within each other (Bloom 1956; Anderson et al. 2001) and are regarded as interdependent.

Fig. 3
figure 3

The Dino Problem. Cartoon adapted from an original work by Frank Hauser, Jr., published in Science, vol. 250, pp. 1103 (1990)

The CINS and the Dino Problem were used as pre-instruction and post-instruction assessments. The pretest was administered in class before the instruction on evolution and the post-test was embedded in the final exam. In both cases, students were rewarded with points for completing the assessments but were not graded for correctness of their responses. The instructor used the pretest to evaluate students' baseline knowledge of evolution, but did not provide students direct feedback about it. Students did not receive study materials or rubrics related to these questions at any time in the semester. The Concept Frame was part of the midterm exam that immediately followed the evolution unit, and students received a grade for correctness of their explanations.

Research on Avida-ED use in the classroom received approval from the local institutional review board (IRB), and 155 of the 188 students who completed the course voluntarily agreed to participate in this study by signing IRB-approved informed consent forms. The assessment data analysis was performed after the end of the course; in this study, we analyzed and reported only data from students who completed all assessments (pretest, post-test, and midterm exam) and gave informed consent. Applying these criteria resulted in a complete dataset for 124 subjects (n), representing approximately two thirds (66%) of the student population.

Rubric Design and Inter-Rater Reliability

Our instructional team has used the Dino Problem in several iterations of different introductory biology courses. Critical reading of hundreds of post-instruction student responses to this problem clearly indicated that the best student answers included all five key principles of evolution by natural selection, as follows:

  1. (a)

    described the existence of phenotypic variation (tall and short individuals) among both animals and plants in the initial frame of the cartoon (variation);

  2. (b)

    referred to random genetic mutation, or different alleles, as the reason for the observed differences (origin of variation);

  3. (c)

    explained that genes responsible for the different traits were inherited (inheritance);

  4. (d)

    explained that individuals with characteristics better suited to their environment survived and reproduced more—e.g., taller trees escaped herbivory; taller animals had access to more resources than shorter animals (fitness);

  5. (e)

    concluded that natural selection resulted in these two populations changing over time (change in a population).

Based on these observations, we created a pilot rubric that assigned a total of ten points to the assessment (two points for each principle or conceptual category). For each of the five conceptual categories, we assigned a score of:

  1. (a)

    zero, if the principle was completely missing or wrong (misconception);

  2. (b)

    one, if the principle was present and correct, but rather incomplete (novice interpretation);

  3. (c)

    two, if the principle was present, correct, and complete (expert interpretation).

Two raters independently applied the pilot rubric to code approximately one third of the students' answers to the Dino Problem (39 out of 124) for both pretest and post-test. After two iterations of application and refinement, we obtained a rubric that yielded 89% agreement on the pretest and 86% on the post-test. The data presented result from the application of the final rubric (Table 2) by a single calibrated rater.

Table 2 Coding rubric for the Dino Problem

We applied an analogous procedure to design and calibrate a coding scheme for the Concept Frame. Two raters independently applied a pilot rubric to code approximately one third of the students' Concept Frames (38 out of 124). After comparing and discussing the results among raters, we refined the rubric, obtaining an overall 88% agreement. The data we present result from application of the rubric, described below, by a single calibrated rater.

We used a binary coding scheme that assigned a score of either zero or one to each cell of the frame for a maximum of ten possible points. We formulated five simple criteria, one for each of the key principles assessed, applicable to both tobacco and Avida-ED. Each cell in the frame was scored independently of the others with only a few exceptions (five out of 124 students misplaced some of their answers). For the purpose of this study, we were not interested in the students' ability to follow instructions but rather in their conceptual understanding of evolution; therefore, we accepted correct responses even in the rare cases in which they appeared in the “wrong” cell. For each principle listed below, an explanation earned a score of one if it identified:

  • the phenotypic trait that varies in the populations (variation);

  • changes (or mutations) in genes (computer code, in the case of Avida-ED) as the initial source of variation (origin of variation);

  • inheritance of genetically determined traits (alleles, genes, mutations) (inheritance);

  • differential fitness—or reproductive success—as the result of the environment selecting certain phenotypes (fitness);

  • a change over time in populations, manifested as a shift toward certain phenotypes (change in a population).

Data Analysis

Although our sample size is relatively large (n = 124), our data did not meet the assumptions of normal distribution and equal measurement intervals; we therefore chose to use a nonparametric test (Wilcoxon matched-pairs signed-ranks test) to establish whether differences between pretest and post-test scores were statistically significant. Because we performed multiple Wilcoxon tests for both the CINS and the Dino Problem, we chose to lower our level of confidence for each test to α = 0.005.


Students' Background Knowledge of Evolution by Natural Selection

Prior to evolution instruction, we measured students' initial knowledge and understanding of the five key principles of evolution by natural selection, using both the CINS and the Dino Problem. The combined results from these two instruments provided us a more complete picture of students' knowledge of evolutionary concepts and processes. The CINS allowed us to detect students' baseline knowledge of individual concepts and principles, while analysis of the Dino Problem revealed what principles (concepts and connections) students incorporated in their reasoning about the process of evolution. The average score of the Dino Problem pretest (24%) was lower than that of the CINS (51%; Fig. 4a). This was not surprising, given the higher level of challenge posed by the constructed-response instrument. While students were able to select correct answers among the distracters on approximately half the CINS items, they clearly did not have the understanding of the process required to craft a complete and accurate explanation of how evolution by natural selection may occur. We observed that, when articulating their explanations about evolution in the Dino Problem, students primarily focused on concepts at the organismal level. The most represented principles in students' answers were those of phenotypic variation [V], connecting the scales of organisms and populations, and of differential fitness [F], connecting organisms to their environment (Fig. 5, pretest). The principles most prominently missing in students' constructed responses were those involving connections with genetics concepts (the origin of variation, the inheritance of genetically determined phenotypes, and the change in allele frequency in populations). Interestingly, the pattern of student answers to the Dino Problem identified the “systems thinking” difficulty of conceptually connecting the different levels of biological organization (i.e., organisms with genes and populations).

Fig. 4
figure 4

Students' knowledge about five key principles of evolution by natural selection was measured with the CINS and the Dino Problem before and after instruction on evolution (n = 124). The maximum possible score on each instrument was ten points. a The mean score for both instruments improved after instruction. Application of the Wilcoxon matched-pairs signed-rank test indicated that the changes were statistically significant with α < 0.005. b A visual representation of the data, indicating the pretest–post-test score distribution among individual students. The gray-shaded squares at the intersection of pretest and post-test scores represent the number of students who received a given combination of scores. Diagonals indicate no change between pretest and post-test; individuals below the diagonal performed better in the pretest than in the post-test. The majority of individuals improved their scores in the post-test and are, therefore, represented above the diagonal

Fig. 5
figure 5

Dino Problem pretest–post-test scores (n = 124) broken down by conceptual categories (V variation, O origin of variation, I inheritance, F fitness, P population change). The charts illustrate the percentage of students who correctly applied each given principle in their explanation of evolution by natural selection. Based on our rubric in Table 3, a score of two indicates an expert answer in a given category (gray bars), while a score of one indicates a novice answer (white bars). Numbers above the bars indicate the overall percentage of student answers that included a given principle (either at the novice or at the expert level). The Wilcoxon matched-pairs signed-rank test, applied independently to student scores in each of the five categories, indicated that the pre-to-post improvement was statistically significant in all categories with α < 0.005

Data from the pretest supported the validity of our learning objectives (Table 1) and further highlighted the need for emphasizing connections across levels of biological organization as a key to meaningful learning about evolution. Subsequent instruction on evolution focused on the genetics concepts students had already learned about in the course (mutation, genotype, phenotype, heredity) and their application in the context of organisms, populations, and their environment to explain the mechanism of natural selection.

Students' Learning about Evolution in the Course

Comparison of students' performance on the CINS and the Dino Problem before and after instruction revealed statistically significant post-instruction learning gains, measured by both instruments (Fig. 4a). As with the pretest, the mean post-test score was higher for the selected-response instrument (CINS, 68%) than for the constructed-response instrument (Dino Problem, 41%). Statistical analysis of student data for the individual CINS items (Table 3) revealed that students significantly improved their level of knowledge and comprehension of the key principles of origin of variation, fitness, and change in a population. In this study, we found no statistically significant difference between pretest and post-test for the CINS questions on variation and inheritance (Table 3).

Table 3 Selected CINS items used in this study to detect pretest and post-instruction knowledge of principles of evolution by natural selection

Analysis of the Dino Problem pre–post scores for each of the five conceptual categories revealed that, after the instruction on evolution, the percentage of students that correctly included each of these fundamental principles increased significantly (Fig. 5).

However, the frequency of concepts in the Dino Problem post-test (Fig. 5, right panel) indicated that students, after the instruction on evolution, still:

  1. (a)

    primarily focused their explanations on organisms and populations. The principles of phenotypic variation within populations, differential fitness, and change in populations over time were included (at either the novice or the expert level) in 75%, 72%, and 65% of students responses, respectively;

  2. (b)

    largely neglected the genetic component of the evolutionary process. Only a small percentage of student answers earned the expert score (two points) in the categories of origin of variation, inheritance, and change in a population (10%, 17%, and 13%, respectively). For these three categories, student responses earned a score of two only if they included explicit connections to genetics concepts (e.g., reference to mutations in genes as the ultimate origin of variation [O], inheritance of traits that are genetically determined [I], and change of allele frequencies in populations [P]; see rubric in Table 2).

By placing individual students in a matrix in which their pretest and post-test scores provide the coordinates, we obtained an alternative representation of the data, which focuses on the individual students' pre-instruction–post-instruction change (Fig. 4b). The majority of students achieved on the post-test a higher score than on the pretest (64% improved their score on the CINS and 61% improved their score on the Dino Problem). Interestingly, even after instruction on evolution, 19 out of 124 students wrote explanations of the evolutionary process (for the Dino Problem) that did not include any relevant principle of evolution by natural selection. Typically, these and other student answers that earned very low scores (one or two points out of ten possible) either contained very generic, unstructured, narrative accounts of evolution or included fundamental misconceptions, primarily the idea that organisms “need to” or have a purpose to evolve (plants evolved taller height to escape herbivory, then animals evolved long necks in order to be able to eat).

Avida-ED and Cross-Domain Application of the Principles of Evolution

We used the Concept Frame described in the “Methods” section (Fig. 2) to assess students' ability to apply each of the five key principles of evolution by natural selection in the context of the Avida-ED model and of a natural system (wild tobacco). No specific connection existed between Avida-ED and the case of wild tobacco evolution, other than that they were both examples of evolution the students were familiar with from using them as in-class activities. Because the same underlying principles of natural selection apply to these—as well as to any other case of evolution—we aligned the two cases in this assessment.

Because we had embedded specific learning objectives in the Avida-ED activity design, we were particularly interested in finding out whether, after working with Avida-ED, students understood and could explain in the context of both the digital and the natural model:

  • the genetic origin of variation within a population (random mutations) and

  • the relative nature of fitness (dependent on phenotypes and the environment).

The outcomes of this assessment, illustrated in Fig. 6, offer multiple insights on student learning:

Fig. 6
figure 6

Analysis of the Concept Frame. For each of the five key principles, the graph and table indicate what percentage of students (n = 124) provided correct explanations for both Avida-ED and tobacco (dark gray), for tobacco only (white), or for Avida-ED only (light gray)

Variation and the Origin of Variation

While most students (78%) correctly identified corolla length as the trait responsible for phenotypic variation in wild tobacco populations, only 41% of the students demonstrated a clear understanding of the nature of phenotypic variation among Avidians (the ability to metabolize different resources). Most of the answers we coded as “incorrect” indicated that students understood at the surface level that Avidians were phenotypically different among each other, but had no clear idea of what made them so. Avidians' “color” was the most common feature students used as a descriptor of phenotypic variation. This is not incorrect per se (Avidians are indeed “color-coded,” based on their metabolic rate or generation time), but their color is essentially a proxy for representing differences in fitness among individuals, which result from being able to perform complex functions and use resources.

Interestingly, while most students clearly identified what phenotypic trait varied in wild tobacco populations, only 25% of them correctly ascribed the origin of such variation to random genetic mutations in the tobacco genome. The most common causal explanation students provided for the existence of different corolla lengths within populations was some environmental factor (climate, pollinators, etc.), either with no reference to the mechanism by which the environment caused variation or with a reference to environment-dependent natural selection being the cause of variation (rather than acting on it). Conversely, a high percentage (40%) of students correctly explained that random mutations in the code led to the phenotypic variation observed among Avidians (even though some of these students were confused about the actual nature of the phenotypes).

We suspect that this pattern is due to the fact that mutation is one of the “rules” made explicit in Avida-ED. In the tobacco case, on the other hand, no explicit mention of mutations had occurred in class, since the tobacco activity was primarily geared toward understanding fitness. To complete the Concept Frame, therefore, students needed to make an inference regarding the origin of variation in tobacco populations. Several students (22.6%) completed the frame with a correct explanation of the genetic origin of variation in the context of Avida-ED, but not of tobacco plants (Fig. 6), which we may interpret as an inability to transfer a principle across distinct cases of evolution.


Students' explanations about fitness in the context of this assessment were quite incomplete, often missing the reference to the environment and its selecting action on phenotypes. Only about 23% of students applied this principle correctly to both scenarios, and the distribution of individuals who correctly addressed only one of the two examples was more heavily weighted toward the case of tobacco evolution. Overall, nearly 50% of the students explained differential fitness based on different corolla length in different geographic areas in the context of tobacco; only 30% of students, however, explained that Avidians that could metabolize certain highly rewarding resources had higher fitness than those who could not, in the presence of those resources.

These outcomes were most likely influenced by different variables:

  1. a.

    Students' uncertainty about the nature of phenotypic variation among Avidians may have negatively affected their explanations about fitness.

  2. b.

    The in-class activity on tobacco was focused on measuring and understanding differential fitness.

  3. c.

    Most students lacked the ability to transfer conceptual knowledge across the two different instances.


Avida-ED is a versatile educational tool that can provide the basis for a great variety of learning activities about evolution. Designed to facilitate student learning about evolution by natural selection and the nature of science, Avida-ED supports inquiry-based active and cooperative learning and offers great flexibility of implementation (in the classroom, as homework, in the laboratory).

This study aimed at assessing not Avida-ED in itself, but our own use of this technology-based tool in the context of a large reformed introductory biology course. The challenge we described is the same any introductory biology teacher may encounter: that of choosing a tool that aligns with the course objectives and of designing instruction using that tool to create meaningful learning experiences for students.

Multiple Kinds of Assessment Capture the Complexity of Student Learning About Evolution

Multiple kinds of assessment, as part of routine classroom practice, allow capturing multiple facets of student understanding (National Research Council 2001). Measuring knowledge and comprehension of concepts with selected-response instruments, such as concept inventories, tells us whether learners can recognize a correct statement among a set of possible answers. The use of common misconceptions as distracters allows also monitoring the existence and persistence of certain alternative ideas. In this respect, the CINS is a very useful tool. Our broader course learning goals, however, extend beyond concept knowledge and comprehension and include that students develop analysis and application abilities. Furthermore, evolution by natural selection is not a collection of facts and concepts but a complex biological process. To assess students' understanding of natural selection, we need to measure not only their understanding of individual concepts and principles, but their ability to connect them in meaningful ways to explain how evolution works and to recognize that the same theoretical principles of evolution apply to independent, apparently unrelated, concrete examples.

We therefore aligned other assessment tools to the CINS to complement and extend assessment of student learning. The results we described support the idea that measuring knowledge and comprehension of fundamental principles of evolution by natural selection with a multiple-choice instrument, such as the CINS, only provides a partial picture of students' understanding of the process of evolution. Other instruments are necessary if we want to detect students' ability to explain how these principles apply to different cases and domains (e.g., the Concept Frame) or to use them to articulate a short explanation of how natural selection works (as in the Dino Problem). For example, based on the CINS post-instruction results, 56% and 65% of students correctly answered questions 6 and 19 of the CINS, respectively (about the origin of variation). Many of them, however, did not correctly apply this principle to the example of wild tobacco evolution (only 24% correct answers, Concept Frame; Fig. 6), and even fewer students (19%, Fig. 5) incorporated this principle in their explanation of how natural selection works in the context of the Dino Problem.

Nehm and colleagues' important work on measuring knowledge of natural selection (Nehm and Reilly 2007; Nehm and Schonfeld 2008) very thoroughly addresses the use of multiple assessments, including the CINS and open-response instruments, to measure diversity and frequency of key concepts in student answers about evolution. We refer readers to this work for a rich discussion of advantages, disadvantages, comparability, and complementarity of different assessment forms. Our study extends the reflection on assessment beyond measuring “concept knowledge” of evolution by natural selection to assessing “process knowledge,” the ability to relate concepts and principles to each other and to incorporate these principles in coherent scientific explanations of natural phenomena. We not only argue in favor of multiple forms of assessment based on their potential for detecting different cognitive abilities (Bloom 1956), but also because the assessment approach used in instruction, and especially the use of formative assessment, are known to influence how students learn (National Research Council 2001).

We know, for instance, that learners often develop knowledge that is highly contextualized and therefore rather inflexible. Transfer—the ability to extend knowledge beyond the specific context in which it was acquired—is a fundamental aspect of expertise (National Research Council 2000). Being able to transfer knowledge means recognizing when (i.e., in what situations) it is appropriate to apply certain knowledge, understanding the principles underlying facts, and being able to move fluidly between multiple representations. By comparing side-by-side two very different instances of evolution by natural selection (Avida-ED and wild tobacco populations) in the Concept Frame, we wanted to capture the extent to which students perceived common patterns across instantiations of the same underlying principles. Organizing frames, like the Concept Frame we used in this study, have a great potential for use in the classroom as learning tools to promote transfer of knowledge (West et al. 1991). We plan on using this type of tool in the future not just for summative, but primarily for formative assessment, coupled with timely feedback, as a way to encourage students to move fluidly between different instances of evolution and between abstract principles and concrete situations.

Understanding Evolution Requires Complex Systems Thinking

Evolution teaching and learning are traditionally considered challenging for various reasons, including learners' misconceptions and beliefs or learners and teachers' lack of understanding of the nature of science. While we cannot ignore the reality of these challenges, we also cannot ignore that evolution is difficult because it is complex. Little is known yet about what pedagogical practices or instructional tools best facilitate teaching and learning about complex systems, although some indications emerge quite clearly. Research on learning about complex systems, for instance, suggests that using computational models that generate the emergent behavior of natural systems facilitates learning and promotes cross-domain transfer of conceptual knowledge (Bodemer et al. 2005; Hmelo-Silver and Azevedo 2006; Jacobson and Wilensky 2006; Goldstone and Wilensky 2008; Evagorou et al. 2009).

Avida-ED has the potential to serve as an interactive and dynamic model of the complexity of evolution and as a unique opportunity for students to practice developing system thinking skills, namely, those of making connections among concepts at multiple scales of biological organization and of seeing the unifying patterns across different cases of evolution by natural selection.

Avida-ED can help us “bring genetics into evolution.” We can use again the genetic origin of variation as an example. The pattern we observed in the student assessment data is that the genetic origin of variation is by and large the most “difficult” idea for students [as detected by both the CINS (Table 3) and the Dino Problem (Fig. 5)]. However, when we assessed understanding of evolution in Avida-ED through the Concept Frame, we noticed that a large proportion of students (about 44%) correctly understood the origin of variation in that context. This result encourages us in thinking that affording students more opportunities to practice transferring their conceptual understanding of evolution between Avida-ED and examples of evolution in natural systems, coupled with feedback, has the potential to help learners grasp this particularly difficult principle.

Backward Design

Applying a backward instructional design model means to scientifically approach the process of activity design and assessment. We described an activity for evolution teaching and learning with Avida-ED, which we designed based on a convergence of variables, including the learning goals we wanted students to achieve, the tools and resources available to us, and the pedagogies we apply in our classroom. As in all scientific endeavors, we collected and analyzed data. Accurate analysis of the assessment outcomes provided us with clear indications of what in our activity design we could improve and how to improve it.

Feedback from students' work in the course, prior to the Avida-ED activity, indicated that many students held significant misconceptions about the random origin of variation and the relative nature of fitness. Regarding the origin of variation, we observed at least two alternative conceptions: (1) that mutations (or other, otherwise unspecified, changes) occur in organisms not randomly but in response to a “need to adapt” to their environment and (2) that the environment itself is somehow the “cause” or origin of such changes in organisms. In other words, many students do not see natural selection as acting on existing genetic variation but rather, as the source of it. Regarding fitness, our observation was that students tend to view fitness as an absolute attribute of an individual—much in the way they think of physical characteristics, such as strength, speed, agility, etc. Although students were able to correctly define fitness as being a measure of reproductive potential, they appeared to have difficulty thinking of fitness as a relative property that varies as a function of multiple environmental variables. These observations, immediately resulting from classroom assessment (pretests, in-class group work and discussions, homework, etc.) contributed, with our existing learning objectives, to guide design of the Avida-ED activity and assessments.

During the Avida-ED activity, the instructor interacted with students as they worked in their groups; this interaction is extremely important, as it allows the instructor to note students' behavior and attitudes. Careful observation adds an affective component to quantitative assessment data that contributes to informing instructional design. While working with Avida-ED in the classroom, students were highly engaged with some citing a “video game” quality that made them curious about “what would happen next.” However, the 80-minute class session was insufficient for students to both complete the activity and continue to explore questions that arose naturally as “what if” scenarios. Furthermore, a number of questions about the interface or interpretation of observations arose within the working groups that required instructor intervention. In a class of nearly 200, it was not possible to address all student questions in a timely or efficient fashion, and this left some students feeling frustrated by the experience. Furthermore, the limited amount of time available for the classroom activity posed a significant constraint on our taking advantage of the full potential of Avida-ED for authentic inquiry. The activity we designed, although open-ended, did not include students designing experiments or testing hypotheses. Similarly, our activity did not explicitly address student learning about the nature of science.

We concluded that, to fully exploit the potential of learning with Avida-ED, students should work with the software in a setting with a much lower student–teacher ratio, for a period longer than a single class meeting, and perform multiple exercises (including experiments of their own design). These conclusions would naturally indicate small-size upper-division courses as ideal settings for Avida-ED-based evolution teaching and learning. Still, we sought to incorporate Avida-ED in our large-enrollment introductory biology course, based on its potential for supporting learning about evolution, complex systems thinking, authentic inquiry, and understanding of the nature of science. In subsequent iterations of the course, we moved instruction with Avida-ED from the classroom to the laboratory part of the course, as a scaffolded two-week exercise. In week one, students completed a guided activity similar to the one described in this report. Smaller laboratory sections of 32 students afforded time and access to instructor resources (graduate teaching assistants and undergraduate learning assistants) for asking and answering questions arising within working groups. The second week, following from the previous guided activity, students posed original questions about evolution, designed experiments to test their questions using Avida-ED, and collected and analyzed their results. Assessment included a written assignment in which students discussed their experimental findings as well as their interest in and experience with the Avida-ED software. Analysis of the assessment data will inform us of the effectiveness of our revised design.