A multi-institutional study of the impact of open textbook adoption on the learning outcomes of post-secondary students

In some educational settings, the cost of textbooks approaches or even exceeds the cost of tuition. Given limited resources, it is important to better understand the impacts of free open educational resources (OER) on student outcomes. Utilizing digital resources such as OER can substantially reduce costs for students. The purpose of this study was to analyze whether the adoption of no-cost open digital textbooks significantly predicted students’ completion of courses, class achievement, and enrollment intensity during and after semesters in which OER were used. This study utilized a quantitative quasi-experimental design with propensity-score matched groups to examine differences in outcomes between students that used OER and those who did not. The demographics of the initial sample of 16,727 included 4909 students in the treatment condition with a pool of 11,818 in the control condition. There were statistically significant differences between groups, with most favoring students utilizing OER.


Introduction
Textbooks have traditionally been an essential part of the post-secondary experience for the majority of students in the United States. In a typical scenario, a professor assigns a textbook as the core instructional material for her class; students are obligated to purchase this book and use it to study the material in preparation for each class period. While the costs of these textbooks vary, Hilton et al. (2014) found that, across a series of general education courses (including science, math, humanities, and business) at seven different colleges, the average textbook price was approximately $90.00.
While all students face high textbook costs, individuals from lower socioeconomic backgrounds may face particular difficulties. Paulsen and St. John (2002) found that low and lower-middle income students reported that the financial implications of attending college were important factors in their choices regarding college. Provasnik and Plenty (2008) reported that individuals with lower incomes are more likely to delay college enrollment than wealthier peers. For some college students, the total cost of textbooks can exceed total tuition costs (Goodwin 2011). Some students, then, may be forced either not to purchase textbooks (presumably resulting in less learning) or take fewer classes (resulting in slower time to graduation) in order to manage or reduce college costs (Buczynski 2007).
Electronic textbooks promise a more affordable option for students. Electronic textbooks typically cost less than traditional textbooks due to the lack of printing costs. Rockinson-Szapkiw et al. (2013) found that utilizing electronic textbooks did not negatively impact student cognitive outcomes.
Another, even less expensive solution to rising textbook costs can be found in the utilization of open educational resources (OER). We next present a review of literature relating to OER and studies pertaining to the perceptions and efficacy of OER.

Review of literature
The William and Flora Hewlett Foundation has defined open educational resources as: teaching, learning, and research resources that reside in the public domain or have been released under an intellectual property license that permits their free use and re-purposing by others. Open educational resources include full courses, course materials, modules, textbooks, streaming videos, tests, software, and any other tools, materials, or techniques used to support access to knowledge (Hewlett 2013).
OER materials eschew traditional copyright in lieu of licenses that allow others to retain, reuse, revise, remix, and redistribute the materials (Hilton et al. 2010;Wiley et al. 2014). The vast majority of the OER utilized in this study were available for free online. Thus, digital versions could be accessed on a wide variety of devices.
Open textbooks, which are a collection of OER aggregated in a manner that resembles a traditional textbook, take many shapes and forms. Typically, free digital versions of the textbook are made available to students. In addition, students who wish to purchase print versions of the textbooks can do so, at prices as low as $5 per textbook. While the quality of open textbooks varies, many go through rigorous editorial and design processes. Perhaps not surprisingly, students are favorably disposed towards replacing costly commercial textbooks with free open textbooks. Bliss et al. (2013) studied open textbook adoption at eight different institutions of higher education. Fifty-eight teachers and 490 students across the eight colleges completed surveys regarding their experiences in utilizing the open texts. Bliss and colleagues found that approximately 50 % of students said that the OER textbooks were of the same quality as traditional textbooks and nearly 40 % said that they were better. In their free-response comments, students focused on several benefits of the open textbooks, including cost-savings. For example, one student said, ''I have no expendable income. Without this free text I would not be able to take this course.'' In the same study, researchers found that 55 % of teachers adopting OER reported that the open materials were of the same quality as the materials they had previously used, and 35 % felt that they were better. One teacher in the study pointed out that ''The materials were free to my students, which reduced a barrier to their chances for academic success. '' While Stratton et al. (2007) noted that results have been mixed in studies examining the relationship between student finances and their success in continuing through completion, several studies have indicated that greater financial resources correlate positively with student persistence. For example, Paulsen and St. John (2002) demonstrated that ''the responsiveness of poor and working-class students to tuition increases is alarmingly high-reducing their probability of persisting by 16 and 19 %, respectively, per $1000 increment in tuition'' (p. 229). While Paulsen and St. John did not discuss the cost of textbooks, it is interesting to note that the figure they used for an increase in tuition ($1000), is approximately the same amount of money full-time college students typically spend on textbooks per year. Thus one could argue that reducing textbook costs to zero could potentially increase persistence rates.
While not usually measured directly, it is possible that the use of no-cost or lowcost OER might free students' resources to support increased credit loads which then enhance progress toward graduation. Wiley et al. (2015) analyzed the cost savings in courses with sections that used OER and sections that did not. The average cost of commercial textbooks across the courses was $140.85 which represented a potential total cost of $1,324,017.68 for that sample. In that instance, OER could have saved over one million dollars in textbook costs, which could have been applied directly to tuition for additional courses.
While financial reasons might be particularly persuasive to students and other educational stakeholders, the core purpose of education is to support learning. If the adoption of open textbooks decreases costs but also negatively influences student learning, educators should well view them with skepticism. While encouraging this skepticism, the authors fully acknowledge that institutions and educators everywhere trade improved affordability for lower outcomes on a regular basis. For example, colleges universally forego providing a full-time tutor for each student. Even though Bloom's two-sigma work suggests this would greatly increase student learning, colleges instead choose to place students in educationally sub-optimal but significantly more affordable classes with many other students and a single instructor. Because this particular trade of sub-optimization for affordability is well established and broadly accepted, it is essentially invisible to many faculty. By contrast, a decrease in student learning associated with the adoption of open textbooks would be novel and likely to draw the negative attention of faculty, students, and other stakeholders. However, if learning outcomes actually improved in settings where open textbooks are utilized, there may be significant policy implications.
Perhaps because OER is relatively new, little research has been performed on how its utilization influences student learning. To date, six studies have compared student performance with and without implementing OER. These studies vary in rigor and all state that there are limitations to their findings. Nevertheless, they constitute the research done to the present time. Lovett et al. (2008) measured the efficacy of an OER statistics module in comparison with the traditional educational model at Carnegie Mellon University. In two separate semesters, they invited students who had registered for an introductory statistics class at Carnegie Mellon to participate in an experimental online version of the course. Of those who volunteered, approximately one-third were randomly selected to take the online course, while the remaining two-thirds who had volunteered became the control group. The control group took the traditional, face-to-face statistics class at Carnegie Mellon. Researchers compared the results of these two groups in fall 2005 by examining their test scores (three midterms and one final exam), and found that there was no significant difference between the two groups. This experiment was replicated in spring 2006 with the same, non-significant, result. Thus, utilizing OER resulted in cost-savings without improving-or sacrificing-learning outcomes.
In another study focused on Carnegie Mellon's open statistics modules, Bowen et al. (2014) compared the use of a traditional textbook in a face-to-face lecture class with that of a blended approach utilizing OER. Six hundred and five students took the OER version of the course, while 2439 took the traditional version. Bowen and colleagues found that, while students who utilized OER scored slightly higher than their peers on standardized exams, the difference was not statistically significant. A potential confound was that those utilizing OER received blended learning instead of traditional face-to-face instruction. Thus it is possible that the pedagogy masked the influence of OER. Nevertheless, it is relevant to note that in this study that the use of OER did not lead to lower student outcomes.
In a non-experimental case study, Hilton and Laman (2012) compared the performance of 690 students using an open textbook in an introductory psychology class to the performance of 370 students who used a traditional textbook in a previous semester. They concluded that students who used the open textbook achieved better grades in the course, had a lower withdrawal rate, and scored better on the final examination. Feldstein et al. (2012) found that students in courses using open textbooks typically had higher grades and lower failure and withdrawal rates than those in courses with traditional textbooks. However, they did note significant limitations to their study suggesting that they provided only interesting data to be more rigorously pursued in the future.
Similarly, a case study presented by Hilton et al. (2013) focused on four math classes at Scottsdale Community College. These classes used the same departmental exam for each course for several years, which allowed faculty members to compare how students did on department exams when OER were used as compared with previous semesters. OER replaced traditional learning materials in fall 2012, and student results at the end of this semester were approximately the same as those obtained by students in fall 2011 and fall 2010. Pawlyshyn et al. (2013) found that when OER material was integrated into the math courses at Mercy College, student learning significantly increased. The pass rates of math courses increased from 63.6 % in fall 2011 (when traditional learning materials were employed) to 68.9 % in fall 2012 when all courses were taught with OER. Similarly, students who were enrolled in OER versions of a reading course performed better than their peers who enrolled in the same course using non-OER materials.
Recent research indicates that a majority of faculty members perceive OER to be of approximately the same quality as traditional textbooks. Allen and Seaman (2014) surveyed 2144 college professors regarding OER. Of the 34 % (729) who were aware of OER, 61.5 % indicated OER had about the same ''trusted quality'' as traditional resources, 26.3 % said that traditional resources were superior, and 12.1 % said that OER were superior. Similarly, 68.2 % said that the ''proven efficacy'' were about the same, 16.5 % said that OER had superior efficacy, and 15.3 % said that traditional resources had superior efficacy. Allen et al. (2015) studied an experimental class of 478 students that used OER known as ChemWiki for its primary textbook, while a control class of 448 utilized a commercial textbook. These two sections were taught the same semester at consecutive hours using the same faculty member and teaching assistants in order to control for potential confounds. Students in these classes received the same exams. No significant differences were found between the two groups. Beginning of the semester pre-tests combined with final exams showed no significant differences in individual learning gains between the two groups, thus indicating that OER could be substituted without any negative impact on learning.
While the aforementioned research provided interesting contextual case studies and varying degrees of statistical rigor, much more work needs to be done to ascertain the relationships between the use of OER and student academic performance. The purpose of this study was, therefore, to explore whether the use of open textbooks at 10 colleges significantly predicted learning outcomes in a group of 16,727 post-secondary students.
In the present study we sought to address the following questions: 1. Comparing students who utilize OER and those who do not, is there a difference in the number of students who complete a course?
2. Comparing students who utilize OER and those who do not, is there a difference in the number of students who pass a course with a C-or better grade? 3. Comparing students who utilize OER with those who do not, is there a difference in the course grade? 4. Comparing students who utilize OER and those who do not, is there a difference in the number of credits they take in the semester they used OER (fall)? 5. Comparing students who utilize OER and those who do not, is there a difference in the number of credits they take the semester after the one in which they utilized OER (winter)?

Participants
The initial data set consisted of 4128 students enrolled in undergraduate courses from the following 4- Minority students represented 57.5 % of the sample. Ages of students ranged from 15 to 87 with a mean of 22.63 and a standard deviation of 6.8.

Data analysis
We estimated differences between the treatment and control groups across five important outcomes: (1) rates of completion of courses, (2) rates of passing courses with a C-or better grade, (3) course grade, as measured by the numerical grade (for example, A = 4.0), (4) enrollment intensity (credit load) in fall semester when they used OER, and (5) enrollment intensity (credit load) in the following semester (winter) while controlling for credit load in fall semester. Outcomes 1 and 2 were estimated using Chi square tests of independence. Outcomes 3 and 4 were estimated using an Independent Samples t test. Outcome 5 was estimated using analysis of covariance (ANCOVA). Propensity score matching across the entire sample was applied to outcomes 4 and 5. Because of naturally occurring average differences in course difficulty across departments and teachers, each course was considered separately for outcomes 1-3. Propensity score matching within each course resulted in small sample sizes and therefore was not applied in outcomes 1-3.

Propensity score matching
In order to enhance the clarity of prediction of persistence outcomes based on textbook condition we used propensity score matching to create subsets of students who were statistically similar across three important covariates: age, gender, and minority status. Propensity score matching homogenizes comparison samples and reduces variance associated with covariates. (Guo and Fraser 2010). Propensity score matching has been particularly helpful in educational research where random assignment is logistically difficult to achieve (see, for example, Riegle-Crumb and King 2010; Robinson et al. 2014).
Using SPSS, we used logistic regression to create propensity scores by regressing the bivariate treatment condition on age, gender, and minority status. We created matched samples using nearest neighbor matching within calipers (Guo and Fraser 2010). The original sample included 16,727 students with 11,818 in the control condition and 4909 in the treatment. There was a 2.4-1 ratio of available controls to be matched to treatment subjects. We used the formula e B .25 rq where e is the caliper and rq indicates the standard deviation of the propensity scores of the original sample (Rosenbaum and Rubin 1985). This resulted in a caliper of 0.01 for this study. The initial logistic regression required subjects to have all relevant covariates (as in no missing data). Given this requirement and the narrow caliper used in this study, the procedure matched 4147 treatment subjects with 4147 controls. Of the 4909 available treatment subjects, 762 were not included because of missing data or because there was no matching control subject within the narrow selection caliper. Propensity score matching led to improved balance in gender and minority status across groups but had little effect on age, which was relatively well matched in the original sample (see Table 1).

Completion
When comparing the groups within each course in terms of completion, the pattern across the 15 courses showed almost no significant differences. In two courses, Business 110 and Biology 111, students in the treatment condition showed a significantly higher rate of completion than students in the control condition. In the case of Business 110, the differences in withdrawal rates were quite clear; 21 % of students in the commercial textbook condition withdrew from the course while only 6 % of students in the OER condition withdrew from the course (see Table 2). Passing with a C-or better grade When comparing the groups within each course in terms of C-or better, the pattern across the 15 courses was mixed. In nine courses there were no significant differences in achievement. In five courses, students in the treatment condition were more likely to pass the course than students in the control condition. In one course, Business 110, students in the control condition surpassed students in the treatment condition in terms of the percentage who had a C-or better (see Table 2).

Course grade
When comparing the groups within each course in terms of course grade, the pattern across the 15 courses was also mixed. In 10 courses there were no significant differences in course grade. In 4 courses the students in the treatment condition achieved higher grades than students in the control condition. In one course, Business 110, students in the control condition received higher grades than students in the treatment condition (see Table 2).

Enrollment intensity in fall semester
An independent samples t-test was conducted to test whether there were differences between the treatment and control groups in terms of their credit loads in the fall semester when they used OER. The treatment group's mean credit load was 13.29, which was significantly higher than the control group's mean of 11.14.

Enrollment intensity in winter semester
An ANCOVA was conducted to test whether there were differences between the treatment and control groups' credit loads in winter semester while controlling for the effects of credit load in fall semester. Credit load in the fall semester was a significant covariate that needed to be controlled [F(1, 6440) = 1224.96, p \ .01)]. Credit load in the fall semester was held constant at 12.54. After removing the variance associated with the fall credit load, the marginal mean winter credit load for the treatment group was 10.71, while the marginal mean winter credit load for the control group was 9.16 (see Fig. 1). There remained a significant difference between the treatment and control groups in terms of credit load in the winter semester [F(1, 6440) = 154.08, p \ .01)].

Discussion
This is by far the largest study of its kind conducted to date-nearly 5000 postsecondary students using OER and over 11,000 control students using commercial textbooks, distributed among ten institutions across the United States, enrolled in 15 different undergraduate courses. In three key measures of student success-course completion, final grade of C-or higher, course grade-students whose faculty chose OER generally performed as well or better than students whose faculty assigned commercial textbooks.
In two key measures of enrollment intensity, which is an indicator of student progress toward graduation, students in courses that used OER were significantly different than students in courses with commercial textbooks. Even when controlling for differences in previous enrollment, students in courses using OER enrolled in a significantly higher number of credits in the next semester. This may be due to the cost savings associated with OER. In community college settings where tuition costs are based directly on the number of credits taken with no cap on costs for ''full-time'' enrollment, funds saved on textbooks can be applied directly to enrollment in additional courses.
The mechanisms underlying these improvements differ from those typically hypothesized to underlie improvement in student outcomes. Historically, comparison studies of instructional products have often been conducted to test hypotheses about differences in student outcomes attributable to alternate modes of delivery or instructional design approach (see Russell 2015). The authors do not believe differences of mode of delivery or instructional design between OER and commercial textbooks to be the primary mechanisms responsible for the differences in outcomes observed in this study. On the contrary, our informal review reveals strikingly similar, essentially equivalent instructional designs in the OER and commercial textbooks. We believe the effects demonstrated in this study result from differing degrees of access and affordability facilitated by open licenses used by OER.
The moderate differences in completion rates and final grades between the control and treatment groups are likely a function of access. Some percentage of students in the control group probably failed to purchase the commercial materials assigned by their faculty due to cost or other factors. For example, one survey suggested that 23 % of students regularly forego purchasing required textbooks due to their high cost (Florida Virtual Campus 2012). Students' lack of access to the core instructional materials for the course put them at an academic disadvantage. All students in the treatment group had access to all the course materials from the very first day of class because they were openly licensed. Consequently, we would expect some enhanced probability of success for members of the treatment group.
The differences in enrollment intensity between the control and treatment groups are likely a function of affordability. Students whose faculty assign OER save a significant amount of money compared to students whose faculty assign commercial textbooks. Some treatment students will chose to reinvest these savings by taking an additional course in order to accelerate their graduation. Consequently, we would expect members of the treatment group to take more credits than the control group, on average.
Detecting differences in student outcomes based on access and affordability, rather than instructional design, points to several new horizons for educational research. Hundreds of millions of dollars and person-hours have been invested in improving in-class instructional designs, intelligent tutoring systems, adaptive instructional systems, and other design-related innovations intended to improve student outcomes. The current study demonstrates that at least one non-instructional design option exists that can effectively improve student outcomes.

Limitations
Although this was a robust sample size, the nesting of subjects in a relatively limited number of courses that included both treatment and control sections precluded the use of multi-level modeling, which may have helped account for some patterns in the results. Multi-level modeling might be supported when the number of nests is greater than twenty, which was not possible in this case. The examination of three important outcomes was restricted to rough course-by-course analyses. Although an overall pattern of results seems to have emerged, the course-by-course analysis is less than ideal. Within those courses, propensity score matching was contraindicated which leaves the apparent pattern more confounded than might be desired.
The propensity score matching used in the analyses of enrollment intensity may have given a clearer picture of an important outcome, enrollment intensity. Even so, propensity score matching should not be considered to be a panacea that guarantees causal claims. It does homogenize comparison groups across important confounds and enhances statements of probability. However, the number and variety of confounds in educational research is so large that, even with the most sophisticated controls, causal claims are rarely justified. In this study, conclusions should be taken as statements of enhanced probability and not causation.

Future directions
The authors hope this study will encourage others to pursue similar research. Very little is known about the efficacy of OER and additional large-scale studies of the efficacy of OER are needed. As the number of courses using OER proliferate within institutions, and especially as some sections may use commercial textbooks while others use OER, institutions may be able to conduct multi-level modeled designs that account for department and teacher influences on outcomes.
Thus far, outcomes have been compared with the availability of OER versus commercial textbooks. There are several important covariates to be considered. Both the OER and the commercial textbooks should be evaluated for quality. There is no guarantee that either resource would be particularly effective. The Open Textbook Initiative (http://open.umn.edu) has undertaken efforts to evaluate OER textbooks across ten dimensions. This is a very encouraging development. An additional important covariate is how much students actually used either resource. Rather than conclude that there are no differences in outcome based on availability, quality and usage covariates would be helpful controls. Additional covariates to consider are prior student achievement and teacher effects. It may be that teachers that explore, use, and develop OER are systematically different than other teachers. Subsequent research that can approximate these covariates will result in cleaner estimates of possible differences in student outcomes between OER and non-OER sections.
This study focused on five measures of student success-course completion, final grade, final grade of C-or higher, enrollment intensity, and enrollment intensity in the following semester. Replicative studies in these areas are needed. Moreover, there are several other areas in which student success could be measured. For example, do students in classes with OER receive more ''A'' grades than students using traditional textbooks? Is there a quantifiable difference in how students perform on final exams based on the textbook they use? These and other similar questions could be profitably pursued. We also believe there are other opportunities to improve student outcomes that manipulate variables other than the design of the instruction, and hope this study will encourage other researchers to search for these variables.