Background

Students have well-documented problems with understanding evolutionary concepts, including natural selection (e.g., Gregory 2009), tree thinking (e.g. Baum et al. 2005; Meir et al. 2007; Perry et al. 2008) and genetic drift (Andrews et al. 2012). With genetic drift, in particular, students struggle with the concept of randomness (Garvin-Doxas and Klymkowsky 2008) and also often confuse it with other evolutionary processes, such as mutation (Andrews et al. 2012). Because understanding genetic drift requires a sophisticated understanding of both genetics and evolution, it may only emerge later in a student’s biology education (Andrews et al. 2012). Price et al. (2014) developed the Genetic Drift Inventory (GeDI) as a tool to assess different instructional strategies for teaching genetic drift. This assessment is composed of a series of agree/disagree statements, each of which is associated with either a key concept required for a complete understanding or a misconception that often interferes with understanding of genetic drift.

The Genetic Drift and Bottlenecked Ferrets module (Herron et al. 2014) is a computer-based instructional tool designed to teach genetic drift; it is built around a simulation of a black-footed ferret (Mustela nigripes) population. The module was developed with pedagogical approaches that have been demonstrated to be effective (e.g., American Association for the Advancement of Science 2011; NGSS Lead States 2013; Couch et al. 2015): students begin constructing their understanding of genetic drift by observing and recording data from simulations of how allele frequencies change in small populations, draw inferences and construct explanations from their observations, challenge and build their understanding by making and testing predictions about the consequences of genetic drift on populations, and ultimately apply their knowledge by creating a plan to reintroduce ferrets into wild populations while maintaining genetic diversity.

Computer simulations like the Ferrets module can be effective tools to enhance traditional instruction in science (reviewed in Rutten et al. 2012; Smetana and Bell 2012). Simulations allow students to visualize processes like genetic drift that occur over timescales scales that are difficult or impossible to observe directly, and also allow students to isolate and manipulate parameters that influence the outcome of the simulation, in order to better understand the many variables and their interactions (National Research Council 2011). Because they allow direct observation and investigation at timescales that generally are not feasible for students to investigate in nature, simulations can be particularly appropriate learning tools for evolutionary phenomena (Perry et al. 2008; Bray Speth et al. 2009; Abraham et al. 2009; Abraham et al. 2012).

Because the Bottlenecked Ferrets module was designed independently from the GeDI, the module does not specifically target all of the key concepts and misconceptions identified by Andrews et al. (2012) or tested in the GeDI (Price et al. 2014). This slight misalignment makes the GeDI a particularly powerful independent measure of the effectiveness of the module. Additionally, because the GeDI includes some concepts that the Ferrets module does not explicitly teach (Table 1), we are able to explore changes in student understanding of aspects of genetic drift that are not covered explicitly in the module.

Table 1 Coverage of key concepts and misconceptions about genetic drift in the GeDI and the Ferrets module

Andrews et al. (2012) proposed a model for how students’ understanding of genetic drift emerges during instruction, with three stages that were identified from studying college students’ misconceptions about genetic drift. They defined misconceptions as understanding that is not scientifically accurate, and we follow that convention in this study as well (Crowther and Price 2014; Leonard et al. 2014). Students in Stage 1 start with a novice understanding of both genetics and evolution, and they are struggling to use basic vocabulary correctly, without indicating conceptual understanding. Students in this category are using terms like genetic and evolution vaguely, e.g., “Genetic drift [is] when it’s the same species but different characteristic” (Andrews et al. 2012: 252). Students in Stage 2 are beginning to recognize that there are different mechanisms of evolution, but they often confuse these mechanisms, frequently trying to explain everything as natural selection, e.g., “Genetic drift occurs to eliminate the less adaptable trait that is not well suitable to the environment” (Andrews et al. 2012: 252). Stage 3 describes the misconceptions that students have when they are developing their conceptual understanding of genetic drift, e.g., that genetic drift only occurs in small populations. Andrews et al. (2012) suggested that students moved through these three stages sequentially.

In this study we used the GeDI to evaluate the effectiveness of the Genetic Drift and Bottlenecked Ferrets module (Herron et al. 2014) at teaching genetic drift. We compared pre- and post-instruction GeDI scores of students who completed the Ferrets module to GeDI scores of students who learned genetic drift through lectures and/or other activities introduced by their instructors. We used the results to reinterpret the three stages of learning genetic drift proposed by Andrews et al. (2012).

Methods

Genetic Drift and Bottlenecked Ferrets module

The SimBio Virtual Labs® module Genetic Drift and Bottlenecked Ferrets (Herron et al. 2014) is a learning module built around a series of interactive simulations. Instructions, tables for recording data, and questions are in an accompanying workbook. The module is designed to demonstrate and explore the causes and consequences of genetic drift, including conservation implications, using the example of endangered black-footed ferrets (Mustela nigripes). The interactive simulation models a population of ferrets that vary in coat color, a fictitious single-locus trait with two selectively neutral alleles.

The module has four exercises that guide students through simulations to explore (1) the relationship between sampling error and the founder effect; (2) the gene pool as a source of sampling error; (3) the effect of population size on the magnitude of changes in allele frequency and the likelihood of an allele becoming fixed in a population; and (4) the factors contributing to inbreeding depression. Throughout the module, students are asked to make predictions, record data and observations from repeated runs of the simulations, draw inferences from their observations and data, and construct explanations to explain why their predictions were or were not supported. In a final exercise, students are asked to apply their knowledge on an open-ended challenge of creating a ferret reintroduction plan that is likely to maintain the genetic diversity of the newly founded wild population. They can design reserves of various sizes, with or without connecting corridors, and run simulations to assess their designs using ferrets of known genotypes from a zoo. Readers wishing to see the Genetic Drift module can request a review copy from info@simbio.com.

Genetic drift inventory (GeDI)

The GeDI is a concept inventory designed to measure undergraduate students’ understanding of key concepts in genetic drift and to diagnose misconceptions around that topic (Table 1; Price et al. 2014). The test has 22 true/false items, phrased as agree/disagree, that relate to vignettes describing scenarios in which genetic drift took place. For example, an item asking about the target concept that “The processes leading to genetic drift tend to cause a loss of genetic variation within populations over many generations” (Price et al. 2014: 71) asks students to evaluate a vignette and then to indicate whether a biologist would agree or disagree with the statement that “The island population likely has fewer alleles—that is versions of genes—than the mainland population” (Supplementary Materials in Price et al. 2014). Seven of the items target key concepts of genetic drift, while the remaining 15 items target misconceptions (Table 1).

Treatments

We recruited courses for the two treatments separately, calling on our network of colleagues who have some familiarity with the literature on evolution education for the control treatments or our colleagues who already use the SimBio Virtual Labs® for the module courses. In the control treatment, students received traditional instruction (control courses); in the experimental treatment, students completed the Ferrets module (module courses) in addition to traditional instruction. For both treatments, students completed the GeDI (Price et al. 2014) before instruction on genetic drift began, and completed it again after instruction ended. We excluded students who did not complete both the pretest and the posttest, students who took the assessment three times, and courses that allowed students to work in groups on the test. In all of the module courses and three of the five control courses, the version of the GeDI used in this study was altered slightly from the one published by Price et al. (2014) by adding the phrase “State whether you agree or disagree” to Stem E; the course in the control group from the Research University, Midwest and the Moderate Research University, West were the only courses that used the original wording. Permission to use data from students was granted by Institution Review Board Approval 42,505 from the University of Washington, New England Independent Review Board Protocol 14–131, and California State University, Fullerton Institution Review Board Approval 13_0473. All students in the study consented to participate and consented to have de-identified data published.

Control courses

In the control courses instructors taught genetic drift as they normally would, through combinations of readings from the text, homework, lecture, discussion, and in-class activities. The control group included a total of five courses composed of 315 students total (Table 2, mean class size 63, SD 57); two of these were large courses (>100) and three were small (≤28). Three of the courses were general biology, another was an upper division genetics course, and the last was an upper division evolution course (Table 2). The number of days devoted to instruction and form of instruction varied across courses. The mean time between the pre- and posttest for four institutions was 20 days (SD 6), because testing was intended to surround instruction specifically about genetic drift. In the other institution (Moderate Research University, West), the instructor administered the pre- and posttest in the second and final week of the 17 week semester, respectively (Table 2). In all of the control courses, students received credit for completion of the GeDI, but not for the correctness of their answers on the test.

Table 2 Courses in the control treatment

Module courses

Instructors of the module courses used the Ferrets module as part of their instruction on genetic drift as an in-class or in-lab activity, as a homework assignment, or as a combination of both. For these courses, the GeDI was incorporated into the module as a pretest to be completed before beginning the module and as a posttest to be completed after it. Instructors varied in the timing of when they assigned the pre- and posttests: some assigned the entire lab, including the pre- and posttest, as a single assignment, while others assigned them to be completed separately. All instructors administered the pre- and posttests within a two-week period. The module courses included a total of 19 classes composed of 510 students (mean course size 27, SD 13; Table 3). All of the classes were small to medium-sized (range 10–57).

Table 3 Courses in the module treatment

Instructors varied with respect to the number of days of instruction they devoted to genetic drift, whether they provided additional instruction beyond the Ferrets module, and whether they gave credit for correctness or completion of answers on the posttest (Table 3). Six of the courses were upper division. The rest were lower division general biology courses (100 or 200 level), and four of these were aimed at non-majors. Two of the institutions in the module treatment were community colleges.

Data analysis

We calculated item difficulty on the pretest across both treatments by dividing the number of correct responses for each item by the total number of responses for that item (Crocker and Algina 1986) in order to compare our populations with those reported previously (Price et al. 2014). We used generalized linear mixed models (GLMM), using the glmer function from the lme4 package (Bates et al. 2014) in R 3.1.3 (R Core Team 2014) for the rest of our analysis.

To begin, we investigated whether the student samples in control and module courses were equivalent. We did this by comparing student performance on the GeDI before instruction (pretest) between control and module courses. We found minor levels of overdispersion in this dataset. Overdispersion, when the variance is greater than expected under a given model, is a common attribute of data in a GLMM and can increase the probability of a Type I error (Crawley 2013). We accounted for the overdispersion by including a term in our GLMM for observation-level random effect (OLRE; Harrison 2014). We added an additional random factor in our model, Course, to help account for differences in instructor, class size, institution, and implementation among non-independent groups of students. Thus, our first model included three variables to predict Performance after instruction: Treatment (fixed factor), Course (random factor), and an OLRE (random factor) as predictor variables (Table 4).

Table 4 Summary of statistical tests used

Next, we compared how students performed on the posttest with a model that included Pretest score (fixed), Treatment (fixed), Course (random) and OLRE (random) as predictor variables (Table 4). Because each item in the GeDI targets either a single key concept or a single misconception, we used the same model to compare how students performed between treatments on the items in the GeDI that targeted key concepts essential to understanding genetic drift and items that targeted misconceptions about genetic drift (Table 1). We used Holm-Bonferroni corrections to account for the fact that we conducted multiple analyses of the same sets of data (Holm 1979), resulting in an alpha of 0.025 for the first comparison and an alpha of 0.05 for the second comparison of posttest performance.

We conducted additional analyses to investigate sources of variation between control and module courses that could have affected the outcome of our main analysis. Because the form of credit (completion vs. correctness on GeDI posttest) varied among the module courses, but not the control courses, we constructed two additional GLMMs. The first compared posttest performance between module courses in which students received credit for completion of the GeDI (8 courses) to module courses in which students received credit for correctness on the GeDI (11 courses), controlling for course differences. We found no significant difference in posttest performance. We then re-ran the full model but excluded those module courses that gave credit for correctness on the GeDI. There were no differences in the results of that analysis when compared to the analysis of the full dataset, so we continued with the full model.

We had relatively large differences in mean class size between the control (mean 63, SD 57) and module (mean 27, SD 13) courses. We investigated the impact of class size on our results in two ways. First, we constructed an additional GLMM that included only the module courses with class sizes of 30 or greater. This reduced the number of module courses to eight, and narrowed the difference in mean class size between the treatment groups. The results were largely similar to the analysis with the full dataset, with the exception that the effect of Treatment on the performance of students on questions about misconceptions, which was small in the GLMM using the entire data set (p = 0.018; Table 4), was no longer significant (p = 0.065). Second, we divided the sample of module classes in half to compare the class size of the nine lowest performing classes to the ten highest performing classes, which were 26.8 and 26.9 respectively. Taken together, the results of these two tests suggest that class size is not responsible for the differences we found between treatments, so we continued our analysis with all of the classes.

We used Cohen’s d, a standardized measure of the differences between the means, to calculate the strength of the effect of instruction within each courses, as well as the effect of treatment across courses (Cohen 1998; Sullivan and Feinn 2012). Cohen’s d, in a sense, is a measure of the degree of overlap between the comparison groups. A Cohen’s d of 0 indicates complete overlap between treatments (i.e., the module students performed equivalently to control students), while a Cohen’s d of 3 would indicate that all members of one treatment scored above the mean of the other (i.e., module students far outperformed control students). The larger the absolute value of Cohen’s d, the less overlap between groups, and the stronger the effect of treatment. However, a negative Cohen’s d would indicate a change in the opposite direction as a positive Cohen’s d.

We calculated effect size in two ways: by looking at the effect of instruction on GeDI performance within each course and by looking at the effect of treatment on change in performance on the GeDI. To determine the effect of instruction within each course, we calculated the Cohen’s d [(average posttest − average pretest)/(pooled SDpre- and posttest]. We used Cohen’s d, instead of normalized learning gains, because the calculation for normalized learning gains does not account for the students who have perfect scores on both pre- and posttest, nor students whose scores decrease on the posttest (Miller et al. 2010). We then averaged the Cohen’s d across all courses in each treatment to estimate the average effect of instruction within each treatment.

We then estimated the strength of effect of treatment on student learning by calculating Cohen’s d from the average change in test scores across treatments [(average change in scoremodule − average change in scorecontrol)/pooled SD of change in scoreall courses]. This approach allowed us to account for the change in student performance from pretest to posttest for students when estimating the magnitude of effect of treatment on student performance on the GeDI.

To compare pre- and posttest scores visually, we calculated the proportion of items about key concepts and misconceptions that students answered correctly in each class. We then graphed the average proportion of correct answers by treatment. To highlight how students in the two treatments differed on each item of the GeDI, we calculated the change in percent correct for each item by course, and then graphed the averages of those differences by treatment.

Results

Students’ performance on the GeDI before instruction

Students in both treatments performed similarly on the GeDI prior to instruction (Fig. 1; Table 4). The mean proportion of correct responses on the GeDI pretest across all students in the five control courses and 19 modules courses was 0.58 (SD 0.09), which falls on the low end of the range of GeDI performance reported earlier (Price et al. 2014). Item difficulty ranged from 0.30 to 0.80, closely matching the range of difficulty seen during GeDI development (Price et al. 2014).

Fig. 1
figure 1

The mean change in proportion correct on items about key concepts and misconceptions on the GeDI between pre- and posttest for the control courses (N = 5) and the module courses (N = 19). Error bars are standard error. Students in the module courses improved on items about both misconceptions and key concepts. In contrast, students in the control courses improved only on items about misconceptions; their performance dropped on items about key concepts after instruction

Effect of treatment on students’ performance

Students who completed the Ferrets module showed significant increases in performance on the GeDI after instruction. The mean proportion of correct responses on the GeDI increased from 0.60 (SE 0.02) to 0.70 (SE 0.03). This increase is due to improved performance on items about both key concepts and misconceptions (Fig. 1). Furthermore, students who completed the Ferrets module significantly outperformed (p < 0.001) those in the control courses when accounting for pretest scores and random effects associated with each course (Figs. 1, 2; Table 4). This outperformance occurred both on items about key concepts (p < 0.001) and on items about misconceptions (p < 0.05). In the control courses, mean performance on items about misconceptions improved from 0.49 (SE 0.03) to 0.56 (SE 0.03) (Fig. 1). However, the mean performance on items about the key concepts was significantly worse after instruction, falling from 0.62 (SE 0.04) to 0.43 (SE 0.11) (Fig. 1). Together this indicates that mean student performance in the control group was poorer on the GeDI after instruction, dropping from 0.53 (SE 0.03) to 0.51 (SE 0.03) (Fig. 1).

Fig. 2
figure 2

The mean change in % correct between pre- and posttest for the control courses (light bars, N = 5) and the module courses (dark bars, N = 19). Error bars are standard error. Items (I) from the GeDI are ordered on the x-axis by key concept (KC; a) and misconception (M; b, c). The concepts tested by some of the items in the GeDI were not covered in the Ferrets module (key concepts 4b and 4c and the misconceptions in c; see also Table 1); these are highlighted in red

For control courses, the average Cohen’s d was −0.04 (SD 0.2); the negative value indicates that, on average, the pretest scores were slightly higher than posttest scores (Additional file 1: Table S1). Therefore, instruction in control courses had a minimal impact on student performance. In comparison, the average Cohen’s d across courses in the module treatment was 0.63 (SD 0.59), demonstrating that instruction with the module had a sizable positive effect on student performance (Additional file 1: Table S1). Moreover, the effect size of treatment on the change of scores was quite large: 1.63 (Cohen’s d). This result indicates a strong positive effect of the Ferrets module on student performance relative to control course students.

The improvement of students in the module courses was not limited to content explicitly covered in the Ferrets module. Key concepts 4B and 4C (Fig. 2a; Table 1) are not incorporated into the Ferrets module workbook or simulations. However, students in the module courses improved on items aligned with those concepts, while fewer students in control courses answered those items correctly. A similar pattern occurred with items about misconceptions 7, 8, and 10; these are not covered in the Ferrets module (Table 1), but students in the module courses improved their performance on most of these items, and outperformed students in control courses on many of them (Fig. 2c; Table 1).

Discussion

We found that the students in the module courses consistently outperformed students in the control courses on the posttest, and that the effect of treatment on posttest of the GeDI was quite large. Specifically, students in the module courses show a marked improvement on the GeDI, while scores of students in the control courses decreased after instruction. This decrease is driven primarily by poorer performance on items about key concepts (Fig. 1). In this Discussion, we begin by acknowledging the limitations of our experimental design, and also explain why these limitations do not diminish our findings. We then discuss why we believe students in the control courses did poorly, and why students in the module courses did well. Our interpretation relies on the fact that the quantity and quality of time spent engaging students in making observations, collecting data, and constructing and testing predictions through a computer simulation provides a particularly robust learning environment. In the last part of the Discussion, we propose a revision to the learning framework that others (Andrews et al. 2012) have proposed for how students learn genetic drift.

Limitations of the experimental design

It is surprising that the scores of students in the control courses dropped after instruction. Here we consider aspects of our experimental design that may have contributed to this finding. Ultimately, we conclude that these aspects did not bias our results toward this unusual discovery.

Were students in the module courses more motivated to do well on the GeDI?

Students in the control courses did not receive credit for their scores on the GeDI, whereas most of the students in the module courses did. The students in the module courses who received credit for the number of items they answered correctly might have been more motivated to work harder, and do better, on the assessment (Wise and DeMars 2005; but see Couch and Knight 2015 for an opposing point of view). We accounted for this possibility by running two additional analyses. The first compared performance within the module courses. We compared the courses in which credit was assigned for correctness and courses in which credit was assigned only for completion. We found no difference between the groups. The second analysis compared control courses to the eight module courses that assigned credit for completion. We found the same results that we did when we used the full data set. Furthermore, we found no difference in pretest scores between the control courses and module courses, suggesting that the populations were relatively similar before instruction. Therefore, we think that poorer performance on the GeDI post-instruction was unlikely to be due to differences in student motivation between treatments.

How does small class size affect our results?

The average size of the courses in the module treatment was smaller than in the control courses. Eleven of the module courses had class sizes less than 30; although three of the control courses also had class size less than 30, the other two courses in the control treatment were larger than any of the classes in the module treatment (Tables 2, 3). Since small course sizes may impact learning, some of the difference in performance between the treatments could be due to the fact that module courses were smaller. We tested for this possibility with two additional analyses. The first was a GLMM that compared the control courses to the eight module courses with class sizes greater than 30, narrowing the difference in average class size between treatments. The results of this model differed only in that the significant effect of treatment on posttest performance on the items about misconceptions was lost. It did not change the significant differences in overall performance or on items about key concepts (Fig. 1; Table 3). In a second additional analysis, we saw no discernible relationship between class size and posttest performance. The average class sizes in the module treatment were essentially identical between the nine lowest performing classes and the remaining ten classes. Therefore, we suggest that the difference between control and module treatments on the GeDI post-instruction is unlikely to be due to the smaller class sizes in the module treatments.

How does time on task affect our results?

The increase in performance across most key concepts and misconceptions may be due to the fact that students in the module courses spent more time studying genetic drift. Beyond the qualitative differences between the instructional module and common classroom activities, the module takes approximately 2 h to complete (Table 3). We did not quantify the time devoted to genetic drift in the control courses, but it is likely that students in the module courses took part in instruction on genetic drift for longer than many of the students in the control courses.

In addition to differences in time on task, the quality of time spent working actively with genetic drift also differed. We recognize that most instructors do not have the classroom time to devote to this type of prolonged active engagement and hypothesis testing, nor do they have the time to develop modules that efficiently engage students in activities like the Ferrets module. Therefore, given limitations on instructors’ time, it is reasonable to interpret the increased performance of students in the module courses as due in part to the fact that those students spent more time actively solving problems about genetic drift. A major advantage of the Ferrets module is that it already developed, and it was done so through careful implementation of thoughtful pedagogical practices (e.g., American Association for the Advancement of Science 2011; NGSS Lead States 2013; Couch et al. 2015).

Did differences in time between pre- and posttesting affect our conclusions?

The time between pre- and posttest was much longer in the control courses than in the module courses. One interpretation of this result could be that the students in module courses performed better because there was less time between their pre- and posttests. This outcome would predict that the performance among students in the control courses either did not change, or did not increase as much as they did in the module courses. However, we find that the mean performance of students in the control courses actually drops—a result that is not consistent with the explanation that time elapsed is the best explanation. We also found that the course with the longest time between pre- and posttest (mean 97 days SD 4) showed a slight increase from pre- to posttest scores. Therefore, we conclude that the time between pre- and posttest is not sufficient to explain our results.

Performance in the control courses

Although performance among students in the control and Ferrets module courses did not differ significantly on the pretest, it did differ between treatments on the posttest (Figs. 1, 2; Table 4). Students in the module courses improved on the posttest, but students in the control courses performed significantly worse on the posttest because their performance on items about key concepts decreased (Fig. 1). Students in the control courses showed a decrease in their understanding of three of the key concepts associated with genetic drift (as defined by Price et al. 2014; Table 1; Fig. 2a): (1) that genetic drift can lead to a loss in genetic variation, (2) that the effect that drift can have is governed by a population’s effective population size, and (3) that genetic drift works simultaneously with—and can overwhelm—other evolutionary processes. However, students in the control classes increased their performance on the key concept that “random sampling error happens every generation” (Price et al. 2014: 71).

We know of only one other study that looked specifically at students’ understanding of genetic drift before and after instruction (Andrews et al. 2012). In that study, introductory students answered an open-ended question that required them to consider whether genetic drift could explain a shift in genotype. In the pretest, students referred to genetic drift infrequently: only 1 % of the 85 students referred to genetic drift; even within this 1 %, their comments were so vague that they could not be evaluated. After instruction, 21 of the 122 students referred to genetic drift, but only 13 indicated some knowledge of what genetic drift actually does (Table 2 in Andrews et al. 2012).

Andrews et al. (2012) used their results to propose a framework that describes how students acquire knowledge about genetic drift through three stages, one of which is learning to recognize genetic drift as distinct from other evolutionary processes, such as natural selection, mutation, and migration (Stage 2 in Fig. 3). For example, Item 6 on the GeDI asks students whether a biologist would agree or disagree with the (incorrect) statement that “The fact that individuals that were best suited to the environment had a higher rate of survival contributed to genetic drift” (Supplementary Materials in Price et al. 2014). In our study, we find a big increase in performance on this item for students in both the control and module treatments (Fig. 2b), indicating that they are in Stage 2. Because it is so challenging for students to recognize that evolution encompasses more than natural selection (Price and Perez 2016), the fact that the students in the control courses are making this change is noteworthy.

Fig. 3
figure 3

Revised hypothetical framework for how students learn genetic drift. Unlike the framework proposed by Andrews et al. (2012) in which students progressed through Stage 1, then 2, then 3, this revision shows that students can be simultaneously addressing Stage 2 and Stage 3, depending on the kind of instruction they receive

We postulate that students in the control courses are still in Stage 2, because, even though they recognize the existence of different evolutionary processes, they continue to be confused by the distinctions between them, and they are often distracted by vocabulary. For example, students might confuse genetic drift with gene flow, perhaps because they confound the word drift with the idea of migration (Andrews et al. 2012); on Item 18 of the GeDI, students are asked whether a biologist would agree or disagree with the (incorrect) statement that “Since there was no migration there could be no genetic drift” (Supplementary Materials in Price et al. 2014). Although students in the module courses improved on this item, performance on it did not change after instruction among the students in the control classes (Fig. 2c). This indicates that some confusion over vocabulary persists through instruction.

Students in the control group may be performing worse on the items about key concepts because their understanding of genetic drift is only just developing; inaccuracies are possibly being incorporated into or already existing in their conceptual frameworks (Stage 3 in Andrews et al. 2012). The items in the GeDI about key concepts predominantly focus on how genetic drift works and the effect that it has on a population. For example, Item 1 in the GeDI asks students whether a biologist would agree or disagree with the (correct) statement that “Genetic drift is more pronounced in the [founding] island population than the [larger] mainland population in these first few generations” (Supplementary Materials in Price et al. 2014). We suggest that what students typically learn during instruction is that genetic drift has a powerful effect in founding populations. This focus on small populations, however, can lead to an incorrect conclusion that genetic drift occurs only in small populations, and students often fail to recognize that drift occurs in all real, finite populations. The fact that a misconception like this could emerge from instruction may be a natural consequence of students making sense of new ideas. Indeed, it is unlikely that students think about the situations in which genetic drift occurs before they fully understand what genetic drift is.

Price et al. (2014) suggest that items about key concepts are less difficult for students than items about misconceptions. In this study, the opposite pattern appears to hold true. One key difference between our study and theirs that might explain the opposing findings is that the testing in Price et al. (2014) was completed before instruction. It may be that, for this challenging topic, misconceptions are most difficult prior to instruction, but they are nonetheless easier to dispel then key concepts are to acquire. Moreover, all of the courses used for final testing in Price et al. (2014) were upper division courses, in which students had previously been exposed to genetic drift. It is therefore conceivable that students were already in Stage 2 of the learning framework when they took the GeDI. The students in our control courses are primarily—but not exclusively—in general biology courses (Table 2). Future work exploring the pre- and post-instruction difficulty of items about misconceptions and key concepts could help inform a new model of student learning in genetic drift.

Performance in the module courses

The effect of the Ferrets module on student learning was substantial for a short intervention (average Cohen’s d within module courses = 0.63). Students in the Ferrets module courses significantly outperformed students in the control courses because they did better on items about key concepts, suggesting that they had a better understanding of genetic drift (Figs. 1, 2; Table 4). Their performance even improved on items about key concepts and misconceptions that were not explicitly covered in the Ferrets module, generally to a greater degree than did control course students (Fig. 2c).

The Ferrets module was designed to engage students by guiding them through the construction of their own concept of genetic drift by making observations, collecting data, and making and testing predictions. While this approach is not unique to the Ferrets module, the combination of observations and experimentation with simulations is what makes the module treatment different from the controls. Our experimental design does not allow us to determine how much each of the individual practices contributed to learning. Instead, we can only offer the evidence that the multi-faceted approach to instruction in the Ferrets module supported learning better than classroom instruction alone.

Although we cannot attribute the gains in GeDI scores to specific elements of the Ferrets module, we suggest that computer simulations support learning for topics such as genetic drift. Simulations allow students to investigate population-level phenomena that span generations, like genetic drift, which are otherwise not amenable to investigation in the classroom given the time and spatial scales involved. The visualizations available in the Ferrets module enable students to observe and experiment with several aspects of drift, including the random changes in allele frequency due to sampling error, that these changes occur every generation, and that drift occurs in populations of any finite size. In support of the suggestion that visualization may help teach random processes, Meir et al. (2005) demonstrated that simulations of osmosis and diffusion decreased misconceptions about those molecular phenomena because they allowed students to directly observe the random movement of molecules. Within the Ferrets module, students can set parameters such as population size and initial allele frequency; repeatedly test the effect of varying these parameters; and make and test predictions. Since drift is a phenomenon where the starting conditions impact the outcome in a probabilistic way, repeated testing and varying of parameters may help students build understanding in a way that is difficult to do with reading, lecture, and static representations. As suggested by Windschitl and Andre (1998), simulations that allow for exploration can be effective tools to overcome misconceptions and effect conceptual change. Separating the impacts of computer-based simulations and experimentation would be a topic for future research.

Some intriguing aspects of students’ performance suggest that learning during this activity is particularly sophisticated. In the learning framework hypothesized by Andrews et al. (2012), students begin to recognize different mechanisms of evolution (Stage 2) before they learn content specific to genetic drift (Stage 3); this is the pattern that we observed in students in the control courses. Students in the module courses were increasing their understanding of both vocabulary and genetic drift during the module (Figs. 1, 2; Table 4). Students in the module courses also improved on both key concepts and misconceptions, including some that were not directly addressed in the Ferrets module’s instructions or simulations (Table 2; Fig. 2). This approach was clearly effective.

Revised hypothetical framework for learning genetic drift

As described above, Andrews et al. (2012) hypothesized three stages for learning genetic drift: (1) undeveloped concepts of evolution and genetics at the broadest level, (2) undeveloped and overlapping concepts of different evolutionary mechanisms, and (3) developing understanding about genetic drift in particular. Our results lead us to suggest to revise this framework to incorporate multiple learning pathways, rather than a linear progression through the stages (Fig. 3). The results from students in the module classes suggest that students can move from Stage 1 to either Stage 2 or Stage 3, or to both Stage 2 and Stage 3 simultaneously.

We note that Stage 1 and Stage 2 are about students’ general understanding of evolution, not specifically genetic drift. Our interpretation is that, when students, such as those in the control courses, are moving through Stage 2, they are actually expanding what they know about evolution by recognizing that many different mechanisms of evolution exist. This realization in itself is quite challenging (Price and Perez 2016). Thus, there is misalignment between what students are learning and what instructors intend to be teaching. Instructors think they are teaching genetic drift, but student thinking is revolutionized by a more basic concept that there is more to evolution than natural selection.

Conclusions

The simulation-based Genetic Drift and Bottlenecked Ferrets module is effective at teaching students about genetic drift, as measured by the GeDI. Students who used this interactive module demonstrated deeper comprehension of key concepts about genetic drift, and improved their ability to dispel misconceptions about genetic drift. In contrast, students taught using other common methods of instruction improved their ability to dispel misconceptions, but their grasp of key concepts appeared to decline. We hypothesize that the Ferrets module works in part because the lab allows students to simulate drift and visualize how identical starting points can lead to different outcomes in replicate populations. Interestingly, students improved even in areas not directly addressed by the module.

Earlier work hypothesized that as students learn about genetic drift, they more easily adopt key concepts than they dispel misconceptions, and they pass through a more-or-less linear series of stages toward a developing concept of genetic drift. This study complicates that picture. Our results suggest that as students learn about genetic drift, they simultaneously grapple with more general aspects of evolution, and they can develop new confusions that contribute to a fuzzier picture of how evolution works before and during their progression to a more expert and nuanced understanding. Students in the control groups appeared to enter a stage where their understanding of key concepts about genetic drift decreased, even as they recognized that genetic drift is an evolutionary process distinct from natural selection, migration, and random mutation. In the module courses, some students did not progress sequentially through stages of intermediate understanding, but rather developed some deeper understanding of genetic drift at the same time as they broadened awareness about evolution. This more complex model of student learning suggests that instructional materials cannot assume a particular learning trajectory, and that tools such as the Ferrets module, wherein multiple concepts and misconceptions can be addressed at once, are important aids for efficient instruction in evolution.