Assessing LongTerm Effects of InquiryBased Learning: A Case Study from College Mathematics
 10k Downloads
 26 Citations
Abstract
As studentcentered approaches to teaching and learning are more widely applied, researchers must assess the outcomes of these interventions across a range of courses and institutions. As an example of such assessment, this study examined the impact of inquirybased learning (IBL) in college mathematics on undergraduates’ subsequent grades and course selection at two institutions. Insight is gained upon disaggregating results by course type (IBL vs. nonIBL), by gender, and by prior mathematics achievement level. In particular, the impact of IBL on previously lowachieving students’ grades is sizable and persistent. The authors offer some methodological advice to guide future such studies.
Keywords
Mathematics Inquirybased learning Academic records Grades Course selectionStudentcentered or “active” forms of instruction have been shown to improve student learning and affective outcomes in the sciences and in other fields (Ambrose et al., 2010; Froyd, 2008; Hake, 1998; Prince & Felder, 2007; Springer, Stanne & Donovan, 1999). Yet these proven, “highimpact” educational practices are not typical of what students experience in college (Kuh, 2008). Especially crucial are mathematics courses, key prerequisites that may regulate students’ access to many majors and careers, or to any college degree at all (Carlson et al., 2011; Seymour & Hewitt, 1997; Stigler, Givvin & Thompson, 2010). Thus the use of active learning methods in college mathematics may help to attract and retain students, including students of diverse backgrounds (Ellis, Rasmussen & Duncan, 2013; Fullilove & Treisman, 1990; Watkins & Mazur, 2013).
To date, the most persuasive studies of active learning have examined student outcomes within a single course at one or several institutions (e.g., Deslauriers et al., 2011; Kwon, Rasmussen & Allen, 2005; and studies analyzed by Froyd, 2008, and RuizPrimo et al., 2011). However, as active learning approaches are applied more broadly, evaluating their outcomes presents new methodological challenges. Measures to assess effectiveness must be general enough to apply across different classrooms and institutions. A common test—the most direct method of evaluating classroom learning—may not be available or applicable.
Students’ course grades and coursetaking patterns—their choices to pursue (or not) subsequent courses in a discipline—offer broad and arguably objective measures for evaluating the effects of an educational intervention. While grading standards differ across instructors, courses, and campuses, grades have a fairly stable social meaning (Pattison, Grodsky & Muller, 2013). As part of students’ academic transcripts, grades become lasting records of achievement. Like grades, coursetaking patterns apply to varied academic contexts; they may reflect students’ sustained or lost interest in a discipline following an initial experience. Several recent studies have used various grade and coursetaking measures to evaluate the success of an educational intervention, including final grades and pass/fail rates (e.g., Dubetz et al., 2008; Tai, Sadler & Mintzes, 2006; Tien, Roth & Kampmeier, 2002), the next grade in a course sequence (e.g., Farrell, Moog, & Spencer, 1999; Gafney & VarmaNelson, 2008), grades in multiple subsequent courses (e.g., De Paola, 2009; Weinberg, Hashimoto, & Fleisher, 2009), and enrollment in higher level electives (Carrell & West, 2010).
Mostrom and Blumberg (2012) suggested that studentcentered courses are subject to accusations of grade inflation because the course has lost content or rigor or because different assessment methods enable students to do better. They also argued that grade improvement may in fact measure real improvement in learning. Measures that focus on subsequent courses avoid this issue because students who did and did not experience the intervention all take the same later courses. Moreover, such measures can detect valued and lasting impact on students’ learning, academic success, or academic choices (Derting & EbertMay, 2010).
This study examined undergraduates’ grades and coursetaking following an inquirybased learning (IBL) experience in college mathematics. In the context of mathematics, IBL approaches engage students in exploring mathematical problems, proposing and testing conjectures, developing proofs or solutions, and explaining their ideas. As students learn new concepts through argumentation, they also come to see mathematics as a creative human endeavor to which they can contribute (Rasmussen & Kwon, 2007). Consistent with current socioconstructivist views of learning, IBL methods emphasize individual knowledge construction supported by peer social interactions (Ambrose et al., 2010; Cobb, Yackel & McCain, 2000; Davis, Maher & Noddings, 1990).
In this article we report our analysis of student academic records for patterns in grades and coursetaking among students who had earlier taken an IBL mathematics course or comparative, “nonIBL” course taught with other methods. We focus on results for two groups often underserved by traditionally taught college mathematics courses, women and lowachieving students.
The Study
The academic records study was one element of a large, mixedmethods study of IBL mathematics as implemented at four universities hosting IBL Math Centers (Laursen, Hassi & Hough, 2013; Laursen, Hassi, Kogan, Hunter & Weston, 2011; Laursen, Hassi, Kogan & Weston, 2013). Observation, survey, interview, and test data were gathered from over 100 sections of 40 courses aimed at varied levels and audiences. First we describe results of classroom observations which establish that IBL was a studentcentered, educational intervention. We then outline the methods used to study subsequent grades and coursetaking for students who had completed an IBL course or its nonIBL counterpart (for details see Laursen et al., 2011).
Setting and Courses
Each of the four institutions selected and developed its IBL courses independently and labeled them as IBL or nonIBL based on instructor participation in their grantfunded IBL Center. The courses were well established, having been taught several times prior to our data collection in 2009. To check these labels and to establish whether observed differences in student outcomes were meaningful, we carried out over 300 hours of classroom observation of 42 course sections, having received human subjects approval from our University’s Institutional Review Board and that of each study site where required. The results showed that, despite variation among courses and instructors, several key characteristics differentiated the IBL courses from the nonIBL courses. On average, about 60% of class time in IBL courses was spent on studentcentered activities such as small group work, student presentation of problems at the board, or wholeclass discussion, while in nonIBL courses over 85% of class time consisted of the instructor talking. In IBL courses, students more often took on leadership roles and asked more questions. Trained observers rated IBL courses higher for creating a supportive classroom atmosphere, eliciting student intellectual input, and providing feedback to students on their work. Overall, the data clearly show that students who took IBL sections experienced a different instructional approach than those in lecturebased, nonIBL sections (Laursen et al., 2011, 2013b).

Placement early enough in a typical course sequence to allow for variation in subsequent course choices and grades,

Target sections taught in prior years early enough that subsequent coursetaking was near complete at the time of data collection in 2009, and

Adequate numbers of students enrolled in both IBL and nonIBL sections.
The two courses at Center L were a middlelevel introduction to proof course (designated L1) and an advanced proofbased course (L2). L1 aimed to help students shift from the problemsolving of calculus to the rigorous proofbased approach of advanced courses. It met degree requirements for mathematics, some science and engineering fields, and secondary math teaching. Course L2 was not required but counted toward the math major. Both L1 and L2 were taught in sections of 2030 students. Course sections were not institutionally labeled as IBL or nonIBL; selfselection occurred but was not extensive. Therefore we used statistical methods to control for differences among entering students that might affect their later academic outcomes.
The third course, G1, was the first course in a threeterm sequence including multivariable calculus, linear algebra, and differential equations; all three were offered in IBL and nonIBL formats. Both institutional selection and student selfselection operated heavily in this course. Students were invited to join the IBL “honors” section based on past mathematics performance, thus populating these sections with highachieving, selfmotivated students. NonIBL sections included students of all prior achievement levels taught in large lectures with recitations led by graduate teaching assistants. On average, IBL students had higher SAT scores and high school GPAs than nonIBL students, took G1 earlier (often in their first college term), and pursued mathematics majors in higher numbers. To compare these groups fairly, we constructed a matched sample, which is detailed below.
Variables
We considered several measures by which to assess academic outcomes, beginning with anonymized raw data from standard institutional records. DFW rates, the proportion of students who fail (earn D or F grades) or withdraw (W) from a class (Dubetz et al., 2008), were not useful since they are low in honors and upperlevel courses. Instructors argued that grades and exam scores could not be compared across IBL and nonIBL sections, given differences in emphasis and assessment. Instead, we developed standardized approaches to counting and averaging grades in courses taken after the target course. Because students who took IBL or nonIBL sections of the target later coenrolled in other courses, their later grades can be compared directly to each other, albeit not across courses or institutions. Here we outline the standardized variables.

Number of prior math courses—before the target course, a control for math background;

Number of subsequent math courses—all courses taken after the target course;

Number of subsequent elective courses—elective courses taken after the target course, other than core courses required for the mathematics major; and

Number of subsequent IBL courses—IBLmethod courses taken after the target course.
The number of subsequent required courses is largely determined by students’ progress in the major and not a useful measure of student choice. Majorswitching in or out of mathematics was minimal for all groups in these courses.

Average prior grade—in courses before the target course, a control for math achievement;

Next term average grade; and

Average grade in subsequent elective, required, and IBL courses.
Study Samples for Academic Records Analyses
Course  Sample size  Major demographic subgroups  

Method of control for incoming differences  Total  IBL  NonIBL  Class year  Gender  Race & ethnicity  Major 
L1: Midlevel statistical  1341  211  1130  23% soph 28% jr 39% sr  71% M 29% F  52% white 21% Asian 12% Hispanic  60% math 30% S&E 
L2: Upperlevel statistical  909  123  786  26% jr 52% sr  65% M 35% F  51% white 19% Asian 16% Hispanic 10% foreign  71% math 18% S&E 
G1: Introductory sampling, statistical  197 (of 962)  49  98  72% first 24% soph  62% M 38% F  64% white 18% Asian 10% Hispanic  27% math 55% S&E 
Estimated marginal means for average grade and courses taken subsequent to an IBL or nonIBL mathematics course, for all students and by gender, for three courses
Course L1, 20012008  nonIBL  IBL  
men  women  all  men  women  all  
Sample size (overall & all course counts)  755  322  1077  147  57  204 
Average grade in the next term  
N  351  175  526  66  23  89 
Mean  2.816  2.726  2.786  2.970  2.922  2.957 
standard deviation  0.918  0.926  0.917  0.926  0.921  0.925 
Average grade in subsequent required courses  
N  336  163  499  67  37  104 
Mean  2.626  2.635  2.629  2.846  2.706  2.796 
standard deviation  0.917  0.945  0.916  0.925  0.918  0.918 
Average grade in subsequent elective courses  
N  507  218  725  89  41  130 
Mean  2.790  2.909  2.826  2.936  2.963  2.945 
standard deviation  0.878  0.886  0.862  0.877  0.871  0.878 
Average grade in subsequent IBL courses  
N  51  29  80  22  8  30 
Mean,  2.523  2.534  2.528  3.033  2.826  2.975 
standard deviation  1.000  1.007  0.975  1.008  0.973  0.997 
Sig., IBL vs. nonIBL  *  *  
Effect size for IBL intervention  0.456  
Number of subsequent required courses  
Mean  0.473  0.586  0.507  0.529  0.792  0.604 
standard deviation  0.632  0.646  0.624  0.630  0.627  0.628 
Sig., IBL vs. nonIBL  *  *  
Sig., IBL vs. nonIBL, within gender  *  *  
Effect size for IBL intervention  0.320  0.155  
Sig., men vs. women  **  **  *  *  
Number of subsequent elective courses  
Mean  1.768  1.887  1.803  1.675  1.551  1.641 
standard deviation  1.841  1.848  1.805  1.831  1.820  1.828 
Number of subsequent IBL courses  
Mean  0.059  0.075  0.064  0.137  0.142  0.138 
standard deviation  0.275  0.269  0.263  0.267  0.264  0.271 
Sig., IBL vs. nonIBL  ***  ***  
Sig., IBL vs. nonIBL, within gender  **  **  
Effect size for IBL intervention  0.285  0.280  
Course L2, 20022008  nonIBL  IBL  
men  women  all  men  women  all  
Sample size (overall & all course counts)  477  270  747  77  40  117 
Average grade in the next term  
N  204  122  326  30  11  41 
Mean  2.498  2.736  2.588  2.716  3.039  2.797 
standard deviation  1.000  0.994  0.993  0.997  1.002  0.999 
Average grade in subsequent required courses  
N  92  38  130  17  5  22 
Mean  2.214  2.418  2.274  2.434  3.014  2.564 
standard deviation  1.045  1.042  1.038  1.047  1.042  1.046 
Average grade in subsequent elective courses  
N  289  157  446  48  23  71 
Mean  2.642  2.708  2.665  2.579  2.924  2.690 
standard deviation  0.918  0.915  0.908  0.908  0.911  0.910 
Average grade in subsequent IBL courses  
N  15  8  23  6  1  7 
Mean,  2.194  3.286  2.564  2.205  4.574  2.568 
standard deviation  0.918  0.939  1.093  0.926  0.928  1.101 
Number of subsequent required courses  
Mean  0.051  0.022  0.041  0.019  0.031  0.023 
standard deviation  0.197  0.197  0.191  0.193  0.196  0.195 
Number of subsequent elective courses  
Mean  1.399  1.482  1.429  1.337  1.338  1.338 
standard deviation  1.594  1.610  1.585  1.588  1.600  1.590 
Number of subsequent IBL courses  
Mean  0.020  0.010  0.016  0.038  0.001  0.024 
standard deviation  0.131  0.148  0.137  0.140  0.139  0.141 
Course G1, 20042006  nonIBL (matched sample)  IBL  
men  women  all  men  women  all  
Sample size (overall & all course counts)  61  37  98  28  19  47 
Average grade in the next term  
N  47  21  68  24  15  39 
Mean  2.935  3.132  2.995  3.477  3.279  3.402 
standard deviation  0.912  0.926  0.907  0.916  0.906  1.001 
Sig., IBL vs. nonIBL  *  *  
Sig., IBL vs. non IBL within gender  *  *  
Effect size for IBL intervention  0.593  0.430  
Average grade in subsequent required courses  
N  46  24  70  24  16  40 
Mean  3.022  3.095  3.046  3.209  3.026  3.137 
standard deviation  0.794  0.813  0.786  0.794  0.788  0.791 
Average grade in subsequent elective courses  
N  17  7  20  9  7  16 
Mean  3.052  2.975  3.029  3.498  2.325  2.979 
standard deviation  1.064  0.958  0.997  0.945  0.937  1.000 
Sig., men vs. women  *  *  
Average grade in subsequent IBL courses  
N  3  0  3  21  14  35 
Mean  3.419    3.416  3.602  3.317  3.469 
standard deviation  0.461    0.476  0.454  0.456  0.467 
Number of subsequent required courses  
Mean  2.040  1.815  1.958  2.300  1.788  2.088 
standard deviation  1.453  1.490  1.445  1.460  1.447  1.460 
Number of subsequent elective courses  
Mean  0.771  0.475  0.661  0.996  1.500  1.197 
standard deviation  1.953  1.995  1.841  1.958  1.940  1.954 
Number of subsequent IBL courses  
Mean  0.042  0.019  0.034  1.203  0.898  1.206 
standard deviation  0.461  0.474  0.465  0.466  0.462  0.466 
Sig., IBL vs. nonIBL  ***  ***  
Sig., IBL vs. non IBL within gender  ***  ***  ***  ***  
Effect size for IBL intervention  2.51  1.87  2.52 
Estimated marginal means for grades and coursetaking subsequent to an IBL or nonIBL mathematics course, by prior achievement level, for one course
Course L1, 20012008  NonIBL  IBL  

Low  Medium  High  Low  Medium  High  
Sample size (overall & all course counts)  360  353  364  49  76  79 
Average grade in the next term  
N  186  184  156  18  36  35 
Mean  2.064  2.885  3.438  2.427  2.924  3.680 
std. deviation  0.941  0.922  0.937  0.925  0.924  0.935 
Sig., L vs. M  ***  ***  
Sig., M vs. H  ***  ***  ***  ***  
Sig., L vs. H  ***  ***  **  **  
Average grade in subsequent required courses  
N  180  162  157  24  39  41 
Mean  1.959  2.584  3.360  2.429  2.748  3.377 
std. deviation  0.939  0.929  0.940  0.931  0.931  0.941 
Sig., IBL vs. nonIBL  *  *  
Sig., L vs. M  ***  ***  
Sig., M vs. H  ***  ***  **  **  
Sig., L vs. H  ***  ***  **  **  
Average grade in subsequent elective courses  
N  235  253  237  30  45  55 
Mean  2.195  2.778  3.442  2.344  2.963  3.521 
std. deviation  0.889  0.891  0.893  0.887  0.885  0.897 
Sig., L vs. M  ***  ***  **  **  
Sig., M vs. H  ***  ***  **  **  
Sig., L vs. H  ***  ***  ***  ***  
Average grade in subsequent IBL courses  
N  37  21  22  7  8  15 
Mean  1.571  3.053  3.224  2.992  3.263  3.420 
std. deviation  1.010  0.999  0.999  0.997  0.998  1.026 
Sig., IBL vs. nonIBL  *  *  
Sig., L vs. M  ***  ***  
Sig., L vs. H  ***  ***  
Number of subsequent required courses  
Mean  0.495  0.531  0.494  0.513  0.622  0.648 
std. deviation  0.626  0.639  0.630  0.630  0.628  0.631 
Number of subsequent elective courses  
Mean  1.572  1.914  1.909  1.379  1.651  1.868 
std. deviation  1.840  1.822  1.832  1.820  1.822  1.840 
Sig., L vs. M  **  **  
Sig., L vs. H  *  *  
Number of subsequent IBL courses  
Mean  0.077  0.054  0.061  0.122  0.105  0.179 
std. deviation  0.266  0.263  0.267  0.266  0.270  0.267 
Sig., IBL vs. nonIBL  ***  *** 
To account for students’ prior achievement or ability entering the firstyear course G1, we created an index combining students’ high school GPA with college admissions test scores. Concordance tables were used to convert ACT to SAT mathematics scores and ACT English and reading scores to SAT verbal scores (Dorans et al., 1997; Dorans, 1999, 2004). Because high school grades and admissions test scores are comparably important predictors of college success (Hoffman & Lowitzki, 2005; Noble, 1991), the index weighted high school GPA, math SAT score, and verbal SAT score approximately equally. The new index was divided into seven equal brackets.
Sampling and Analysis
Table 1 summarizes the demographic information for samples from the three courses, totaling 3,212 students. NonIBL samples were larger because IBL section offerings were limited. To develop a nonIBL sample comparable to the selective population in IBL sections of G1, we used the precollege index plus demographic variables to match students. For each IBL student we selected two nonIBL students matched by index bracket, academic major (math, science, nonSTEM, undeclared), academic status (freshmansenior), gender, and race/ethnicity—in that order of priority. Overall, this process yielded highly similar IBL and nonIBL samples.
SPSS (version 18) was used for statistical analyses. To compare means for IBL and nonIBL students we used primarily nonparametric tests (MannWhitney, KruskalWallis, Chisquare), as most of the data were not normally distributed. We found some incoming differences between IBL and nonIBL student groups in the number of math courses and average prior math grade. For L1, IBL students had taken fewer prior math courses and earned higher average math grades prior to the target course. For L2, these differences were not significant. For G1, even after our closematch sampling, there was still a significant difference in the number of prior math courses, with IBL students taking fewer. Thus all reported results are based on applying the General Linear Model (GLM) procedure in SPSS to control for these incoming differences, using as covariates the number of math courses and average prior math grade, or for G1, the precollege index. We report estimated marginal means, which are intended to offset the effect of the covariates as intervening variables.
Effect sizes for the IBL intervention were computed from estimated marginal means and pooled standard deviations for all students and by gender. A different approach was required for effect sizes by prior achievement group. The GLM procedure adjusts the postintervention student outcomes by controlling for prior math GPA. Because the achievement subgroups are based on prior math GPA, using these adjusted outcome measures to calculate effect sizes by achievement group would obfuscate precisely the group differences of interest. Instead, we used Morris’ (2008) PretestPosttestControl group design. This method controls for preexisting differences even when treatment and control groups are nonequivalent, by allowing “each individual to be used as his or own control, which typically increases the power and precision of statistical tests” (p. 365). This design is only appropriate for the grade variables, as the number of math courses taken prior to the intervention does not have a pretest relationship to subsequent course counts; it is a measure of student preparedness instead.
Results
The IBL status of the target course is the primary independent variable by which grades and coursetaking are compared, using the defined variables. We first describe the results for all IBL vs. nonIBL students, then disaggregate results by gender and by prior achievement level.
All Students
When number of courses is examined (Figure 1b), little difference is seen for the more advanced students in L1 and L2. Both IBL groups took modestly (but not significantly) fewer elective courses. Students who took L1 in IBL format were more likely to opt for a second IBL course, while students in L2 had little opportunity to take additional IBL courses.
Among the firstyear students in course G1, however, IBL students pursued more math courses, especially IBL courses (of which three more were available). The difference in elective course count is large, though not significant due to the small sample and high variance. Because the IBL and nonIBL groups were well matched by major, the difference in elective choice is not due to differing requirements. The difference in students’ pursuit of IBL courses is significant.
By Gender
Comparing student grades by gender, some differences are large, at several tenths of a grade point, though not statistically significant. In the advanced course L2, women in both IBL and nonIBL groups tended to outperform their male classmates. For the midlevel course L1, there was little difference in men’s and women’s subsequent grades, while men in the early course G1 tended to outperform women in their own (IBL or nonIBL) sections.
Portraying numbers of courses by gender and IBL status, Figure 2b shows that both male and female IBL students pursued further IBL courses at higher rates. Significant differences versus nonIBL students are found for both L1 (men) and G1 (men and women). IBL women from G1 persisted significantly longer, taking on average a full elective course beyond their nonIBL female peers. This pattern is not mirrored in the more advanced courses L1 and L2.
By Achievement Group
In interviews conducted with instructors as part of the broader study, instructors proposed that IBL would particularly benefit students with weaker academic backgrounds. The strongest students, they felt, would enjoy the challenge of IBL but would succeed in both IBL and nonIBL settings. Based on these hypotheses, we disaggregated the data for IBL and nonIBL students by prior mathematics achievement level. We present results from L1 only. L2 students took too few subsequent courses to support further division of the sample, and this analysis held no meaning for G1 where all students were high achievers. For L1, we empirically divided students into three achievement subgroups: low, GPA < 2.5; medium, GPA 2.5 to 3.4; and high, GPA > 3.4, taking care to match the underlying distributions for IBL and nonIBL samples.
Effect size for IBL intervention on grades and coursetaking subsequent to an IBL or nonIBL mathematics course
Course L1, 20012008  Prior achievement group  

Grade variable  Low  Medium  High 
Average of all subsequent math grades  0.56  0.29  0.35 
Average grade in the next term  0.65  0.04  1.28 
Average grade in subsequent required courses  0.90  0.55  0.18 
Average grade in subsequent elective courses  0.16  0.63  0.45 
Average grade in subsequent IBL courses  3.11  0.68  1.19 
We found few differences in coursetaking by achievement level (Table 3). There was one statistically significant difference: high achievers who had taken an IBL section of L1 took more IBL courses than did their nonIBL peers. This finding matches instructors’ expectations that high achievers would find the IBL method stimulating.
Discussion
Overall, the effect of IBL on students’ subsequent grades and coursetaking was modest when comparing IBL and nonIBL students in their entirety. Certainly no harm was done; IBL students succeeded at least as well as their peers in later courses. This result challenges instructors’ common concern that material omitted to accommodate the slower pace of IBL courses may hinder student success in later courses (Yoshinobu & Jones, 2012).
IBL students also tended to take additional IBL courses if available. While this study controlled for differences in prior mathematics background and achievement, other factors also affect students’ choice of an IBL or nonIBL section: learning beliefs, professor choice, peer influences, and even the time of day the section is offered. Our analyses may not fully separate preselection from causal effects linking pursuit of further IBL courses to a good IBL experience.
Positive effects of IBL on students’ pursuit of further mathematics courses were general to both men and women. These effects are detected among courses where student choice may be most apparent, electives and courses taught with IBL methods. The results by gender are particularly interesting given our findings from immediate postcourse survey data: in nonIBL courses, women reported significantly lower learning gains than did men (Laursen et al., 2011, 2013a, b). This gender gap persisted across several types of intellectual (e.g., conceptual learning, problemsolving) and affective (confidence, interest) gains, though there were no actual differences in men’s and women’s grades. That is, women in nonIBL courses succeeded at similar rates to men, but reported less mastery and lower confidence at the end of the course. The present analysis shows that nonIBL women also persisted in mathematics at lower rates.
In IBL courses, however, women reported similar intellectual and affective gains to men on surveys (Laursen et al., 2011), and their grades were no different. This analysis indicates that IBL women were also more likely to persist in mathematics. Enhanced persistence was apparent following G1, a course early in the curriculum, while after courses L1 and L2, such effects were less detectable as students had fewer terms left in which to adjust their major or course choices. Moreover, IBL experiences may matter more earlier in undergraduates’ careers (Watkins & Mazur, 2013) as also suggested by the higher gains reported by first and secondyear IBL students vs. upperclassmen (Laursen et al., 2011). Women’s apparent grade improvement relative to their male peers from lowerdivision to advanced courses may suggest that women who persist to advanced courses are high achievers who also have high tolerance for their minority status.
Disaggregated by prior achievement, differences in students’ grades and coursetaking patterns became apparent. Taking an IBL course did not erase achievement differences among students, but did flatten them. In nonIBL courses, initial patterns of achievement difference were preserved; previously lowachieving students gained no ground.
Figure 3a compares students to each other, while Figure 3b compares students to their own prior performance. Low achievers’ performance was boosted after taking an IBL course, relative both to their own previous performance and to nonIBL peers. Differences of 0.30.5 grade points are meaningful to students’ future academic options.
The differing impact of IBL on women and low achievers shows that the intervention functions differently for these two groups. For women, the impact of IBL appears to be primarily affective; it is not permanent. IBL courses offer features that are known to be effective for women, including collaborative work (Springer, Stanne & Donovan, 1999), problemsolving, and communication (Du & Kolmos, 2009) and that may enhance women’s sense of belonging to the discipline (Good, Rattan & Dweck, 2012). Public sharing and critique of student work may serve as vicarious experiences that enhance selfefficacy (van Dinther, Dochy & Segers, 2011) and link effort, rather than innate talent, to mathematical success (Good, Rattan & Dweck, 2012). For lowachieving students, however, the effect is longerlasting. We propose that IBL experiences promote what one student called “fruitful struggle,” thereby strengthening transferable problemsolving strategies and study habits. For students who do not already have these skills, this is a powerful and lasting impact (Hassi & Laursen, 2013).
This study also yields some methodological insight. Overall, grades and coursetaking choices are blunt instruments for detecting the impact of an educational intervention. Because these outcomes and their meaning varied importantly by student subgroup, disaggregating results was essential—but necessitated large samples. Comparing patterns in results across three courses yielded insight about the relative impact of IBL experiences on students at different academic stages. The methods used are entirely general and not specific to mathematics courses.
The utility of academic records analysis was understandably sensitive to the nature and timing of the target course. Effects on subsequent grades and coursetaking were most easily detected in courses earlier in the curriculum. Self and institutional selection required the use of stringent controls that in turn required large samples. Variation by prior achievement could not be studied in the G1 course because all students were strongly prepared. Results could be rigorously compared only within a single course, not across courses or institutions. Finally, the analysis required substantial upfront work to gather institutional records, transform data, and define and compute standardized variables. In sum, academic records analysis is not a tool to be applied lightly, yet techniques like these may yield insight for studies of multicourse or multi site educational reform in cases where these design constraints can be accommodated.
Conclusion
College instructors using studentcentered methods in the classroom are often called upon to provide evidence in support of the educational benefits of their approach—an irony, given that traditional lecture approaches have seldom undergone similar evidencebased scrutiny. Our study indicates that the benefits of active learning experiences may be lasting and significant for some student groups, with no harm done to others. Importantly, “covering” less material in inquirybased sections had no negative effect on students’ later performance in the major. Evidence for increased persistence is seen among the highachieving students whom many faculty members would most like to recruit and retain in their department. Thus these results should be useful to instructors seeking evidence to persuade colleagues and students of the value of their approach.
This work raises many interesting questions for future studies. The differential benefits of inquiry learning experiences for lowachieving students highlights their potential to help overcome historical inequities for other groups, such as students of color and firstgeneration college students—groups we could not examine in this study. Comparison of longitudinal effects will be especially interesting in cases where inquiry courses are offered to firstyear students and where multiple inquiry experiences are offered within a single program or institution. While the quantitative approach reported here establishes patterns of student achievement and persistence after an activelearning course, only mixedmethods approaches can both document such effects and reveal the reasons for them.
Notes
Acknowledgments
The Educational Advancement Foundation supported this work. We thank Clint Coburn and Tim Archie for research assistance and Andy Cameron, Michele Keeler, and Steven Velasco for access to the data.
References
 Ambrose, S. A., Bridges, M. W., DiPietro, M., Lovett, M. C., & Norman, M. K. (2010). How learning works: Seven researchbased principles for smart teaching. San Francisco, CA: JosseyBass.Google Scholar
 Carlson, M., Rasmussen, C., Bressoud, D., Pearson, M., Jacobs, S., Ellis, J., et al. (2011). Surveying mathematics departments to identify characteristics of successful programs in college calculus. In S. Brown, S. Larsen, K. Marrongelle, and M. Oehrtman (Eds.), Proceedings of the 14 ^{th} Annual Conference on Research in Undergraduate Mathematics Education, Vol. 3, pp. 333—338. Portland, OR. Retrieved from http://sigmaa.maa.org/rume/RUME_XIV_Proceedings_Volume_3.pdf
 Carrell, S. E., & West, J. E. (2010). Does professor quality matter? Evidence from random assignment of students to professors. Journal of Political Economy, 118, 409–432.CrossRefGoogle Scholar
 Cobb, P., Yackel, E., & McCain, K. (Eds.). (2000). Symbolizing and communicating in mathematics classrooms: Perspectives on discourse, tools, and instructional design. Mahwah, NJ: Lawrence Erlbaum Associates.Google Scholar
 Cohen, J. (1988). Statistical power for the behavioral sciences (2nd ed.). Hillsdale, NJ: Lawrence Erlbaum Associates.Google Scholar
 Davis, R. B., Maher, C. A., & Noddings, N. (Eds.) (1990). Constructivist views of the teaching and learning of mathematics. Journal for Research in Mathematics Education, Monograph No. 4. Reston, VA: National Council of Teachers of Mathematics.Google Scholar
 De Paola, M. (2009). Does teacher quality affect student performance? Evidence from an Italian university. Bulletin of Economic Research, 61, 353–377.CrossRefGoogle Scholar
 Derting, T. L., & EbertMay, D. (2010). Learnercentered inquiry in undergraduate biology: Positive relationships with longterm student achievement. CBELife Sciences Education, 9, 462–472.CrossRefGoogle Scholar
 Deslauriers, L., Schelew, E., & Wieman, C. (2011). Improved learning in a largeenrollment physics class. Science, 332, 862–864.CrossRefGoogle Scholar
 Dorans, N. J. (1999). Correspondence between ACT and SAT I scores. College Board Research Report 991. New York, NY: The College Board.Google Scholar
 Dorans, N. J. (2004). Equating, concordance, and expectation. Applied Psychological Measurement, 28, 227–246.CrossRefGoogle Scholar
 Dorans, N. J., Lyu, C. F., Pommerich, M., & Houston, W. M. (1997). Concordance between ACT Assessment and recentered SAT I Sum scores. College and University, 73, 24–35.Google Scholar
 Du, X., & Kolmos, A. (2009). Increasing the diversity of engineering education—a gender analysis in a PBL context. European Journal of Engineering Education, 34, 425–437.CrossRefGoogle Scholar
 Dubetz, T., Barreto, J. C., Deiros, D., Kakareka, J., Brow, D. W., & Ewald, C. (2008). Multiple pedagogical reforms implemented in a university science class to address diverse learning styles. Journal of College Science Teaching, 38(2), 39–43.Google Scholar
 Ellis, J., Rasmussen, C., & Duncan, K. (2013). Switcher and persister experiences in Calculus 1. Sixteenth Annual Conference on Research in Undergraduate Mathematics Education. Denver, CO. Retrieved from http://pzacad.pitzer.edu/~dbachman/RUME_XVI_Linked_Schedule/rume16_submission_93.pdf
 Farrell, J. J., Moog, R. S., & Spencer, J. N. (1999). A guidedinquiry general chemistry course. Journal of Chemical Education, 76, 570–574.CrossRefGoogle Scholar
 Froyd, J. E. (2008). White paper on promising practices in undergraduate STEM education. Commissioned paper, Board on Science Education, National Academies. Retrieved from http://sites.nationalacademies.org/DBASSE/BOSE/DBASSE_080106#.UUoV5hngJ8g
 Fullilove, R. E., & Treisman, P. U. (1990). Mathematics achievement among African American undergraduates at the University of California, Berkeley: An evaluation of the Mathematics Workshop Program. Journal of Negro Education, 59, 463–478.CrossRefGoogle Scholar
 Gafney, L., & VarmaNelson, P. (2008). PeerLed Team Learning: Evaluation, dissemination and institutionalization of a collegelevel initiative. New York, NY: Springer.CrossRefGoogle Scholar
 Good, C., Rattan, A., & Dweck, C. S. (2012). Why do women opt out? Sense of belonging and women's representation in mathematics. Journal of Personality and Social Psychology, 102, 700–717.CrossRefGoogle Scholar
 Hake, R. R. (1998). Interactiveengagement vs. traditional methods: A sixthousandstudent survey of mechanics test data for introductory physics courses. American Journal of Physics, 66, 64–74.CrossRefGoogle Scholar
 Hassi, M.L., & Laursen, S. L. (2013). Transformative learning: Personal empowerment in learning mathematics. Manuscript in review.Google Scholar
 Hoffman, J. L., & Lowitzki, K. E. (2005). Predicting college success with high school grades and test scores: Limitations for minority students. The Review of Higher Education, 28, 455–474.CrossRefGoogle Scholar
 Kuh, G. (2008). Highimpact educational practices: What they are, who has access to them, and why they matter. Washington, DC: American Association of Colleges and Universities.Google Scholar
 Kwon, O. N., Rasmussen, C., & Allen, K. (2005). Students’ retention of mathematical knowledge and skills in differential equations. School Science and Mathematics, 105, 1–13.CrossRefGoogle Scholar
 Laursen, S., Hassi, M.L., Kogan, M., Hunter, A.B., & Weston, T. (2011). Evaluation of the IBL Mathematics Project: Student and Instructor Outcomes of InquiryBased Learning in College Mathematics. (Report to the Educational Advancement Foundation and the IBL Mathematics Centers) Boulder, CO: University of Colorado, Ethnography & Evaluation Research. Available at http://www.colorado.edu/eer/research/steminquiry.html
 Laursen, S. L., Hassi, M.L., & Hough, S. (2013a). Inquirybased learning in mathematics content courses for preservice teachers. Manuscript submitted for publication.Google Scholar
 Laursen, S. L., Hassi, M.L., Kogan, M., & Weston, T. J. (2013b). From innovation to implementation: Multiinstitution pedagogical reform in undergraduate mathematics. Manuscript submitted for publication.Google Scholar
 Morris, S. B. (2008). Estimating effect sizes from pretestposttestcontrol group designs. Organizational Research Methods, 11, 364–386.CrossRefGoogle Scholar
 Mostrom, A. M., & Blumberg, P. (2012). Does learningcentered teaching promote grade improvement? Innovative Higher Education, 37, 397–405.CrossRefGoogle Scholar
 Noble, J. P. (1991). Predicting college grades from ACT assessment scores and high school course work and grade information. ACT Research Report Series 913. American College Testing Program. Retrieved from http://www.act.org/research/researchers/reports/pdf/ACT_RR9103.pdf
 Pattison, E., Grodsky, E., & Muller, C. (2013). Is the sky falling? Grade inflation and the signaling power of grades. Educational Researcher, 42, 259–265.CrossRefGoogle Scholar
 Prince, M., & Felder, R. (2007). The many facets of inductive teaching and learning. Journal of College Science Teaching, 36(5), 14–20.Google Scholar
 Rasmussen, C., & Kwon, O. (2007). An inquiry oriented approach to undergraduate mathematics. Journal of Mathematical Behavior, 26, 189–194.CrossRefGoogle Scholar
 RuizPrimo, M. A., Briggs, D., Iverson, H., Talbot, R., & Shepard, L. A. (2011). Impact of undergraduate science course innovations on learning. Science, 331, 1269–1270.CrossRefGoogle Scholar
 Seymour, E., & Hewitt, N. M. (1997). Talking about leaving: Why undergraduates leave the sciences. Boulder, CO: Westview Press.Google Scholar
 Springer, L., Stanne, M. E., & Donovan, S. (1999). Measuring the success of smallgroup learning in collegelevel SMET teaching: A metaanalysis. Review of Educational Research, 69, 21–51.CrossRefGoogle Scholar
 Stigler, J. W., Givvin, K. B., & Thompson, B. J. (2010). What community college developmental mathematics students understand about mathematics. MathAMATYC Educator, 1(3), 4–16.Google Scholar
 Tai, R. H., Sadler, P. M., & Mintzes, J. J. (2006). Factors influencing college science success. Journal of College Science Teaching, 36(1), 52–56.Google Scholar
 Tien, L. T., Roth, V., & Kampmeier, J. A. (2002). Implementation of a peerled team learning instructional approach in an undergraduate organic chemistry course. Journal of Research in Science Teaching, 39, 606–632.CrossRefGoogle Scholar
 van Dinther, M., Dochy, F., & Segers, M. (2011). Factors affecting students’ selfefficacy in higher education. Educational Research Review, 6, 95–108.CrossRefGoogle Scholar
 Watkins, J., & Mazur, E. (2013). Retaining students in science, technology, engineering and mathematics (STEM) majors. Journal of College Science Teaching, 42(5), 36–41.Google Scholar
 Weinberg, B. A., Hashimoto, M., & Fleisher, B. M. (2009). Evaluating teaching in higher education. The Journal of Economic Education, 40, 227–261.CrossRefGoogle Scholar
 Yoshinobu, S., & Jones, M. G. (2012). The coverage issue. PRIMUS, 22, 303–316.CrossRefGoogle Scholar
Copyright information
Open Access This article is distributed under the terms of the Creative Commons Attribution License which permits any use, distribution, and reproduction in any medium, provided the original author(s) and the source are credited.