1 Introduction

Lecture capture (LC) is the process by which live classroom lectures are recorded in a digital format and made available for students to review. The use of LC technology in institutions of higher education has steadily increased over the last 15 years and it is believed that more than 4 out of 5 U.S. higher education institutions use some sort of LC technology (Panopto, 2020).

Research has shown that students consistently report LC to be a positive technology that enhances their learning (Dommett et al., 2019; Gosper et al., 2010; Nashash and Gunn, 2013; Owston et al., 2011; Traphagan et al., 2010). On the other hand instructors have a more negative perception of LC citing worries of reduced attendance, reduced engagement and reduced interaction with students (Danielson et al., 2014; Dommett et al., 2019; O’Callaghan et al., 2017).

Despite the positive perception of LC by students, research into the impact LC has on academic performance has yielded mixed results with the majority of studies indicating either a positive or negligible influence and only a small number of studies indicating that LC has a negative effect on academic performance. Previous work appears to fall into one of two general research designs. The first are studies that focus on LC use and the impact on academic performance (LC use studies). These studies can be performed with a single cohort and correlate the number of lecture recording views to academic performance. The size of these studies has been variable focusing on as few as 70 students in a single course (Sloan & Lewis, 2014), up to > 7500 students across many courses (Caglayan & Ustunluoglu, 2021). The second category of research design are studies that focus on LC availability and academic performance (LC availability studies). These studies primarily use two cohorts, often separate sections of the same course, in which one cohort has access to lecture recordings and the other does not and the academic performance of the cohorts is compared. Some studies have combined both approaches and first looked at the effect lecture recording availability had on academic performance by comparing performance to a previous offering of the course when LC was not available and then looked further at the effect of lecture recording use and academic performance within the LC group (Edwards & Clinton, 2019; Johnston et al., 2013; Leadbeater et al., 2013). There have been a wide variety of disciplines, variability, and control across many of these studies which may account for the mixed results observed.

In the current study we present the results of a well-controlled LC availability study in a large biomedical science course.

2 Literature review

There have been many studies to date that have focused on the relationship between LC and academic performance. Table 1 below summarizes previous literature based on research design and impact on academic achievement. We focused on studies in which lecture recordings were supplemented with in class lectures and only studies including both audio and video recordings were reviewed; outdated studies focusing on the use of audio only podcasts were not included.

Table 1 A summary of previous literature of lecture capture use in higher education categorized by research design and impact on academic performance

2.1 Lecture capture has a positive impact on academic performance

Most studies performed to date indicate that LC technology has a positive impact on academic performance. For example, 10 of 16 studies that focused on LC use and achievement were positive while 5 of 11 studies on LC availability and achievement were positive.

2.1.1 Lecture capture use and a positive impact on academic performance

The majority of LC use studies indicate a positive impact on academic performance. These studies will be reviewed in terms of the three general disciplines: science, business/economics, and multidiscipline large-scale studies.

A number of studies that identified a positive impact on academic performance in response to viewing lecture recordings have been conducted in science courses (Baker et al., 2018; Brooks et al., 2014; Danielson et al., 2014; Nordmann et al., 2019).

Baker et al. (2018) analyzed data collected from 453 postgraduate students in an epidemiology course offered over four semesters. The course was offered in-class (internal) or live streamed (external) and the lecture recordings were made available after each class session. Interestingly, a positive impact on academic performance in response to LC viewing was only identified in the external student cohort that viewed at least half of the available recordings and not in internal students, but attendance data was not available and cohort grade point averages (GPAs) were not mentioned which may have impacted the results observed. Brooks et al. (2014) performed a LC use study with over 1000 students in two chemistry courses and one biochemistry course. Rather than simply comparing users to non-users this study categorized students into five different groups based on amount and timing of use in an attempt to identify the best strategy for LC use. While this study found no statistical significance between the groups, they reported that the highest user group outperformed others by as much as 16% on some exams, although this group had the fewest number of students in it causing the results to not reach significance.

Danielson et al. (2014) performed a LC use study comparing 491 veterinary students’ performance on a national board exam to LC views over five years. This exam is separated into five scientific disciplines (anatomy, microbiology, pathology, pharmacology, and physiology) and the authors reviewed LC viewing in courses related to each subsection of the board exam and overall viewing. The authors found a statistically significant increase in performance in the pharmacology and anatomy subsections of the board exam when LC viewing was higher but not in the other three. The authors concluded that LC may be more beneficial in disciplines that involve more rote learning rather than applied learning.

A number of studies conducted in the disciplines of business or economics have focused on LC use and academic performance (Aldamen et al., 2015; Chen & Lin, 2012; Jones & Olczak, 2016; Sloan & Lewis, 2014; Williams et al., 2012).

Aldamen et al. (2015) looked at 254 students across five different sections of an introductory accounting class and found that students who viewed the LC recordings had a slight positive increase in performance when compared to those who did not. This was a well-controlled study in which the instructor and learning materials remained nearly identical and looked at additional variables such as GPA, attendance, and seniority. The authors of this study noted that the positive increase was limited to a section of the course offered in a later semester and mentioned that students from the previous semester may have offered guidance on accessing and operating the lecture recordings leading to the slight positive increase in performance. Chen and Lin (2012) studied the impact of LC use on exam performance in a microeconomics course and found a significant and positive relationship between the use of online recordings and grades when analyzing data from 312 students across three sections of the course. While the instructor remained the same for all sections of the course and the authors accounted for student attendance, the study did not mention if the learning materials and exam questions were identical or if GPA between the cohorts was examined.

Jones and Olczak (2016) compared students that watched at least 15 min of one lecture recording to students that did not view any recordings in an economics class with an enrollment of 380 students and found a small increase in performance. Attendance was not monitored in this study and the authors concluded that the positive impact observed may have been influenced by non-attendance.

Sloan and Lewis (2014) identified a positive association between access to lecture recordings and performance on exams in two sections of an operations management course containing a total of 70 students. Additionally, the positive association observed was present after controlling for previous exam performance. It is worth noting that this study used a variety of parameters to increase LC viewing such as assigning certain recordings as mandatory.

Williams et al. (2012) analyzed data from 371 students in a first-year microeconomics class and found a conditional positive impact on academic performance in response to lecture recording viewership. The positive impact on academic performance was restricted to those students who self-reportedly attended lectures and reviewed recordings but was not observed in those that self-reported using the recordings primarily as a substitute for missed classes implying that LC viewing is beneficial when used as a supplement to live lectures.

A large scale multidisciplinary study involving over seven thousand students looked at the impact of LC viewing on academic performance (Caglayan & Ustunluoglu, 2021) and found that increased viewing correlated with higher grades for junior, sophomore and senior undergraduate students but not for freshman students and stressed the importance of making students aware of LC technology and the best practices for its use.

2.1.2 Lecture capture availability and a positive impact on academic performance

Many studies involving LC and academic performance have focused on LC availability rather than viewing to identify the impact availability has on academic performance. The following studies reviewed below found LC availability to have a positive impact on academic performance (Dommeyer, 2017; Shaw & Molnar, 2011; Terry et al., 2015; Wiese & Newton, 2013).

Dommeyer (2017) compared the academic performance and attendance of 56 students with LC available to 28 students without LC available in separate sections of a marketing research course. Although the two cohorts had similar cumulative GPAs, the students with LC available had greater attendance and outperformed the non-LC group on all aspects of the course.

Shaw and Molnar (2011) compared the grades of 60 medical students in a biochemistry course with LC available to the previous year offering in which 53 students were enrolled but LC was not available. The authors used the individual students’ biology score on the Medical College Admissions Test (MCAT) as a control and found a significant improvement in student performance when LC is available. Further analysis revealed that the improvement was much greater within non-native English language speakers.

Terry et al. (2015) created an empirical model to test student performance in response to LC availability in several business courses with an enrollment of 890 students. This model included many variables such as GPA, age, gender, and Scholastic Assessment Test (SAT) scores and found that students performed statistically better when LC was made available.

Wiese and Newton (2013) compared the grades of 597 students in a 3rd year nutrition course with LC available to two previous offerings of the course with 890 students when LC was not available and found a statistically significant increase in performance when LC was available. It is worth noting that this study did not account for GPA between the cohorts and the instructors were not identical between the cohorts.

Of the 27 LC studies reviewed the majority (14) have indicated a positive impact on academic performance. However, 10 of the 14 positive studies were LC use studies and only 4 of the 14 were LC availability studies. It is worth noting that variable levels of control were used in these studies and from the literature reviewed above, LC appears to be most beneficial when used to supplement in-class lectures.

2.2 Lecture capture has a negligible impact on academic performance

While the majority of studies involving LC and academic performance have been positive, a number of studies have reported a negligible impact on academic performance. 6 of 20 studies reviewed that focused on LC use and achievement were negligible, whereas 4 of 11 studies that focused on LC availability and achievement were negligible.

2.2.1 Lecture capture use and a negligible impact on academic performance

Six studies were identified that reported a negligible effect of LC use on performance (Bollmeier et al., 2010; Chapin, 2018; Hadgu et al., 2016; Leadbeater et al., 2013; Mallinson & Baumann, 2015; Sarsfield & Conway, 2018; A. E. Williams et al., 2016).

Bollmeier et al. (2010) limited lecture recording availability to 72 h following each lecture in a therapeutics course and found no link between LC use and final grades in the course when analyzing data from 122 students. It is worth noting that students performed better on the final exam in this course compared to the final exam in a different therapeutics course. The caveat here is comparing grades from two different courses with different content may not allow for accurate comparisons.

Chapin (2018) compared LC viewing and attendance to academic performance in two psychology courses with an enrollment of 71 students and found no differences in course final grades when comparing the number of views or attendance.

Leadbeater et al. (2013) analyzed data from two years of a modular second-year medical science course with a total enrollment of 140 students in which some modules had lecture recordings available, and some did not. Each module ended in an examination which allowed the authors to compare students grades in modules with recordings available to those without and to also compare lecture recording use to grades in the modules that had recordings available. Both LC availability and LC use did not result in a change in academic performance in this study.

Mallinson and Baumann (2015) compared self-proclaimed lecture recording views to final grades in a 350-student introduction to American politics course. The authors categorized students into six different viewing categories based on use and found no increase in performance in any of the viewing categories.

Sarsfield and Conway (2018) compared LC viewing and exam grades in 17 different modules (each ending with an exam) across several disciplines including biology, chemistry, biochemistry, math and physics in a single academic year. Students were categorized based on grade attained and the authors found no link between grade attained and the amount of LC viewing. The authors noted that timing of when the students accessed the recordings appeared to be important since students who regularly accessed lecture recordings throughout the learning period rather than just prior to exams performed better.

Williams et al. (2016) looked at LC use and academic performance in two sections of an introductory biology course with an enrollment over 800 and found no link between viewing and academic performance when controlling for SAT scores. Similar to Sarsfield and Conway (2018), Williams et al. (2016) found that students regularly viewing recordings throughout the course rather than just before the exams performed better.

Hadgu et al. (2016) performed a small study with 63 students in an introductory physiology course in which two lectures were recorded and made available. The authors compared students that only attended the live lectures to those that attended the live lectures and accessed the recordings. The authors found that overall exam performance was indistinguishable between the groups and so was performance on the questions related to the recorded lectures. The authors also looked at whether students with high grades or low grades tended to access the recordings and found no association.

2.2.2 Lecture capture availability and a negligible impact on academic performance

The following studies observed a negligible impact on academic achievement in response to LC availability (Brotherton & Abowd, 2004; Ford et al., 2012; Traphagan et al., 2010). Brotherton and Abowd (2004) created an eclass that involved the automated generation of lecture recordings along with notes and tested it on multiple sections of a software engineering course and calculus course with a total of 165 students at two different institutions. The authors found no significant change in performance when eclass was made available.

Ford et al. (2012) studied the impact of LC availability on academic performance in four sections of a general psychology class taught by the same instructor in which two sections with a total of 60 students had LC available and two sections with a total of 59 students did not. The authors found no link between LC availability and academic performance when comparing the two cohorts, but it should be noted that only approximately 20% of lectures in the LC cohort were recorded and made available.

Traphagan et al. (2010) performed an LC availability study in two sections of a geology course and found that exam and quiz scores were higher for the 211 students that did not have access to lecture recordings when compared to the 153 students that had access, but this finding was negligible when the authors corrected for GPA.

Overall, 10 of 27 LC studies reviewed have indicated a negligible impact on academic performance. Of the 10 negligible studies 7 were LC use studies and 3 were LC availability studies. Many of the LC studies indicating a negligible impact on academic performance were well controlled and accounted for differences in attendance and GPA. It is interesting to note that 2 negligible LC use studies indicated the timing of when LC recordings were viewed appeared to be more important than the amount of viewing.

2.3 Lecture capture has a negative impact on academic performance

Few studies have implicated LC as having a negative impact on academic performance. We were unable to find a single LC use study with negative results but reviewed 3 LC availability studies implying a negative impact on academic performance (Drouin, 2014; Edwards & Clinton, 2019; Johnston et al., 2013).

Drouin (2014) performed a LC availability study in two sections of an introductory psychology course with 141 students and found a decline in academic achievement in the LC group. It was also noted that there was a significant decline in attendance in the LC group and that when attendance was controlled for there was no significant difference in grades between the LC and non-LC group. Edwards and Clinton (2019) performed a LC availability study in two offerings of a quantitative research course with a total of 321 students and found similar results to Drouin (2014); LC availability resulted in a decline in course grades but when a decline in attendance in the LC group was controlled for the decline in grades observed was negligible. Edwards and Clinton (2019) took their study a step further and looked at LC viewing and attainment and found no significant relationship between LC views and attainment even when correcting for attendance and concluded that the net effect of LC in their study was generally negative. Johnston et al. (2013) performed a LC availability study in two sections of an anatomy and physiology course for nurses with a total enrollment of 499 students and found that the LC group had a decline in overall academic performance. Again, it was noted that there was a decline in attendance in the LC group. This study also looked at LC viewing and performance within the LC group and found a weak positive relationship. There were a few notable weaknesses in this study; the courses were offered at different campuses with different instructors and the academic abilities of each cohort was not considered.

Only 3 of 27 studies reviewed indicated LC as having a negative impact on academic performance and all 3 of these studies were LC availability studies. It is worth noting that 6 of 9 LC availability studies indicated a negligible or negative impact on academic performance implying that simply making LC recordings available is not likely to increase academic performance in an entire cohort.

2.4 Current study

The current study is a quasi-experimental design, LC availability study in which two offerings of a course are compared, one with LC available and one without. The instructor and learning materials were the same and only identical exam questions offered to both cohorts were analyzed. To our knowledge this is the first study to observe both a significant decline in academic performance and increased rate of failure in a LC group when both attendance and GPA are controlled for.

3 Methodology

Ethics approval for this two-group quasi-experimental study was obtained from the Behaviour Research Ethics Board at the University of Saskatchewan where data collection occurred. All data in this study is presented as entire cohort averages so that individual students cannot be identified. Only the primary investigator who instructed the course has access to individual student performance in the course and data will be destroyed after five years. GPA data was obtained with ethics approval from Student Reporting as an average with standard deviation for each entire cohort and the primary investigator did not have access to individual student’s GPA’s.

3.1 Course information

Physiology 208.6 (PHSI 208.6): Human Body Systems is a two-term introductory physiology course that was formerly offered by the Department of Anatomy, Physiology and Pharmacology at the University of Saskatchewan. This sophomore course was offered during the fall and winter terms and the evaluation components of this course consist of three midterm exams each weighted at 20% of the student’s final grade and a final exam weighted at 40% of their final grade. There was an optional participation component in the course that involved the use of a student response system weighted at 5% that, if utilized, resulted in a reduction in the final exam weight accordingly.

3.2 Participants

This study utilized a two-group quasi-experimental design in which students enrolled in the 2017–18 offering of PHSI 208.6 that did not have access to lecture recordings is considered the control group and students enrolled in the 2018–19 offering of PHSI 208.6 that did have access to lecture recordings is considered the experimental group. To keep the control and experimental groups uniform in this study only students that completed all four exams in the course were included. Students that missed exams or withdrew from the course before completion were excluded from the study resulting in 433 students in the 2017–18 control group and 414 students in the 2018–19 experimental group. These two cohorts were chosen because the instructor, content, textbook, and evaluation methods remained the same for both. Lecture recordings including audio, slides and video of the instructor were captured using the University of Saskatchewan supported Panopto software and were made available immediately after each class for the entirety of the course.

3.3 Exam analysis

Following completion of the 2018–19 academic year identical questions offered to both the control and experimental groups were analyzed by the course instructor. Midterm I grades were calculated from 35 identical questions offered to both cohorts, midterm II grades were calculated from 39 questions, midterm III grades from 38 questions and final exam grades were calculated from 117 questions. Exams were not made available to students after completion in order to maintain exam integrity.

Final course grades were calculated with the weighting described above and excluded the participation component.

3.4 Participation

Class participation was used as an indirect indicator of attendance in this study. PHSI 208.6 contained an optional participation component that involved the use of a digital student response system (Top Hat). Approximately three questions were posed, at random, to students throughout each lecture and a total of 203 questions were asked throughout the course. Questions posed were primarily multiple choice but true or false, click on diagram, and matching questions were also used. Participation was calculated for each midterm exam considering only the questions posed in the lectures relating to each exam. Final exam participation considered all the questions asked throughout the entire course. Approximately 94% of the 2017–18 cohort answered at least one question on Top Hat and 82% of the class answered at least 50% of the questions asked throughout the course. Approximately 96% of the 2018–19 cohort answered at least one question on Top Hat and 82% of the class answered at least 50% of the questions asked throughout the course indicating there was similar use of the student response systems between the two cohorts.

3.5 Statistical analysis

Statistical comparison of parametric data (exam grades, participation, final grades, and GPA) was conducted with the use of two-tailed unpaired t-tests which compares the mean of two independent groups to determine if there is a significant difference (Lakens, 2013) (statistical significance was set at p < 0.05). Final grades and GPAs were also analyzed for effect size using Cohen’s d which determines if a statistically significant difference between two independent groups is considered a small (d = 0.2), medium (d = 0.5) or large (d = 0.8) large effect (Cohen, 1988). Course passes and failures were analyzed using a non-parametric Chi-squared test which tests if there is a statistically significant difference in two frequency distributions (McHugh, 2013) (statistical significance was set at p < 0.05). Where applicable, results are presented as the mean ± SEM.

4 Results

4.1 Lecture capture availability resulted in a decline in exam grades

To study the effect LC had on academic performance in this course, identical questions offered to both cohorts of students from each exam were used to calculate the exam averages. The availability of lecture recordings to the 2018–19 cohort resulted in a decrease in all four exam averages (Fig. 1).

Fig. 1
figure 1

A comparison of exam grades before (2017–18) and after the availability of lecture capture (2018–19). Exam grades were compared using only identical questions asked to both cohorts of students A). For midterm I grade; 35 questions; 2017–18 (n = 433): 76.81%, 2018–19 (n = 414) 72.59%; difference: -4.21%, p < 0.001. B). Comparison of midterm II grade; 39 questions; 2017–18 (n = 433): 67.90%, 2018–19 (n = 414) 67.01%; difference: -0.89%, p = 0.437. C). Comparison of midterm III grade; 38 questions; 2017–18 (n = 433): 71.96%, 2018–19 (n = 414) 69.36%; difference: -2.60%, p = 0.037. D). Comparison of final exam grade; 117 questions; 2017–18 (n = 433): 70.11%, 2018–19 (n = 414) 66.43%; difference: -3.67%, p = 0.001.

For midterm one 35 identical questions were analyzed for each cohort to calculate the exam averages. The class average declined from 76.81 ± 0.75% in 2017–18 to 72.59 ± 0.90% in 2018–19 after lecture recordings were made available. The 4.21% decline in class average seen in the first midterm exam is considered statistically significant, t(845) = 3.619, p < 0.001 (Fig. 1A).

For midterm two 39 identical questions were analyzed and the class average declined from 67.90 ± 0.75% in 2017–18 to 67.01 ± 0.87% in 2018–19. The subtle 0.89% decline in class average was not statistically significant, t(845) = 0.776, p = 0.437 (Fig. 1B), which may be a result of students being aware that the second midterm in the course was historically the most difficult. A total of 38 identical questions were analyzed for the third midterm exam and the class average again declined from 71.96 ± 0.88% in 2017–18 to 69.36 ± 0.91% in 2018–19. The 2.60% decline observed in the third midterm is considered statistically significant t(845) = 2.083, p = 0.037 (Fig. 1C). Lastly, the final exam averages were calculated using 117 identical questions and we discovered a decline in the class average from 70.11 ± 0.76% in 2017–18 to 66.43 ± 0.87% in 2018–19. The 3.67% decline in class average is considered statistically significant t(845) = 3.202, p = 0.001 (Fig. 1D).

The 2017–18 and 2018–19 offerings of PHSI 208.6 remained nearly identical; the instructor, textbook, content, and evaluation methods were the same and the only notable difference was the availability of lecture recordings. In this study, analysis of identical questions posed to both cohorts of students, the availability of lecture recordings resulted in a decline in the class average in all four exams conducted in the course with three of the four exam declines considered statistically significant.

4.2 Lecture capture had no effect on course attendance

Previous studies on the use of LC in higher education have identified a decline in attendance (Bos et al., 2016; Gosper et al., 2010; Morris et al., 2019; Traphagan et al., 2010) and while only three studies on the use of LC in higher education have identified academic performance as being negatively impacted (Drouin, 2014; Edwards & Clinton, 2019; Johnston et al., 2013), in all three studies the negative impact on grades was attributed to a decline in attendance. To rule out the possibility that a decline in attendance was the cause of the decline in exam grades observed in this course after the introduction of LC, we analyzed in-class participation as an indirect indicator of attendance.

A student response system was utilized in both the 2017–18 and 2018–19 offerings of PHSI 208.6. Approximately three multiple-choice, true or false, click on diagram or matching questions were asked at random during each lecture throughout both semesters. Since questions were posed randomly throughout each lecture it is assumed that students participating were those that were in attendance and thus participation was used as an indirect indicator of attendance. Figure 2 identifies the average participation during the period leading up to each exam. First, we measured the average participation from the first lecture of the class to the lecture before the first midterm exam. In 2017–18 the participation in this period was 80.17 ± 1.37% and increased by 2.56% in 2018–19 after LC was made available to 82.74 ± 1.24% which was not considered a statistically significant increase t(845) = 1.386, p = 0.166 (Fig. 2A).

Fig. 2
figure 2

A comparison of in class participation before and after the availability of lecture capture. 203 questions were asked randomly in class throughout the year with the use of a student response system A). The percent of questions answered in class from the first day of class leading up to the first midterm. (2017–18: 80.17 ± 1.37), (2018–19: 82.74 ± 1.23). Difference: + 2.56, p = 0.166. B). The percent of questions answered in class from after the midterm leading up to the second midterm. (2017–18: 76.56 ± 1.46) (2018–19: 75.44 ± 1.46). Difference: -0.62, p = 0.764. C). The percent of questions answered in class from after the second midterm leading up to the third midterm. (2017–18: 74.33 ± 1.53) (2018–19: 72.69.74 ± 1.58). Difference: -1.64, p = 0.455. D). The percent of questions answered in class throughout the entire course. (2017–18: 75.08 ± 1.42) (2018–19: 74.51 ± 1.40.) Difference: -0.57, p = 0.769.

Next, the average participation was calculated based on the questions asked from the lecture following the first midterm up to the lecture before the second midterm. Average participation decreased slightly from 76.56 ± 1.46% in 2017–18 to 75.44 ± 1.46% in 2018–19; a difference of 0.62% that was not considered statistically significant t(845) = 0.299, p = 0.764 (Fig. 2B). We then calculated the average participation from the lecture following the second midterm up to the lecture before the third midterm. Average participation decreased from 74.33 ± 1.52% in 2017–18 to 72.69 ± 1.58% in 2018–19 which is a decline of 1.64% and is not considered statistically significant t(845) = 0.747, p = 0.455 (Fig. 2C). We also calculated the average participation throughout the entire course. Average participation was 75.08 ± 1.42% in 2017–18 versus 74.51 ± 1.40% in 2018–19, reflecting a difference of 0.57% which was not considered statistically significant t(845) = 0.283, p = 0.776 (Fig. 2D).

The analysis of participation indicates that there was no statistically significant difference in class participation for any single exam or for the entire course. This result indicates that the decline in grades of LC students was not due to a decline in lecture attendance.

4.3 Final grade and GPA analysis indicates lecture capture is partially responsible for the decline in performance

Although the number of students in each cohort was large and of similar size (2017–18 n = 433; 2018–19 n = 414) the possibility exists that the decline in academic performance observed after the introduction of LC was due to differences in the academic abilities of the cohorts. To rule out this possibility we calculated and compared the final average grades in PHSI 208.6 to the average GPA in all other courses taken concurrently with PHSI 208.6.

The average final grade for each cohort was calculated excluding the optional participation component of the course so that the grade presented is based solely on identical exam questions posed to both cohorts. The final average grade in the 2017–18 offering of PHSI 208.6 was 71.38 ± 0.71% which decreased by 3.01% to 68.37 ± 0.80% after lecture recordings were made available during the 2018–19 offering of the course and this decline is considered statistically significant, t(845) = 2.815, p = 0.005 (Fig. 3A).

Fig. 3
figure 3

A comparison of final grades in PHSI 208.6 and cumulative GPA calculated from all other courses taken concurrently in that academic year A). Comparison of final grades in PHSI 208.6. (2017–18: 71.38 ± 0.71), (2018–19: 68.37 ± 0.80). Difference: -3.01%, p = 0.005. B). Comparison of average cumulative GPA calculated from all other courses taken concurrently with PHSI 208.6. (2017–18: 78.34 ± 0.53), (2018–19: 76.85 ± 0.59). Difference: -1.49%, p = 0.060.

Average GPAs for each cohort was obtained with ethical approval from University of Saskatchewan. The average GPA calculated for each cohort was based on all courses taken during the fall and winter terms concurrently with PHSI 208.6 but excluded their grade in PHSI 208.6. The average GPA of the 2017–18 cohort was 78.34 ± 0.53% while the average GPA of the 2018–19 cohort was 76.85 ± 0.59%. A difference in GPAs of 1.49% was observed but is not considered statistically significant t(845) = 1.877, p = 0.060 (Fig. 3B).

Since a decline in both PHSI 208.6 final grade and GPA were observed in the 2018–19 cohort we calculated Cohen’s d on the finals grades and GPAs independently to analyze effect size.

A Cohen’s d on the PHSI 208.6 final grades yielded a value of 0.193 and on the GPAs resulted in a value of 0.128, a difference of 0.064. Cohen (1988) suggested the following values when interpreting effect size; small (d = 0.2), medium (d = 0.5) and large (d = 0.8). From these values it appears that the decline in final grades observed is considered a small but statistically significant effect whereas the decline in GPA is considered a very small or even negligible, not statistically significant effect suggesting that the decline in grades is at least partially due to the availability of lecture recordings.

4.4 A greater number of students failed the course after lecture recordings were made available

Figure 4A highlights the final grade frequency distribution for the two cohorts. Three observations we would like to point out from this data are 1) the availability of LC resulted in an increase in the number of students failing the course; 8.78% of the class failed in 2017–18 and 14.25% of the class failed in 2018–19 after lecture recordings were made available; 2) Lecture recording availability appeared to result in a reduction in the number of students in the 70–79% range; 26.10% in 2017–18, 17.87% in 2018–19; and 3) the % of the class in the 90–100% range remained almost identical; 9.70% in 2017–18, 9.90% in 2018–19.

Fig. 4
figure 4

Final grade frequency distribution and a comparison of pass/fail rates A). The final grade frequency distribution by % of class. 2017–18: > 50% = 8.78%; 50–59% = 15.47%; 60–69% = 18.01%; 70–79% = 26.10%; 80–89% = 21.94%; 90–100% = 9.70%. 2018–19: > 50% = 14.25%; 50–59% = 18.60%; 60–69% = 19.32%; 70–79% = 17.87%; 80–89% = 20.05%; 90–100% = 9.90%. B). A comparison of the number of students that passed vs failed the course. 2017–18: pass = 395, fail = 38; 2018–19: pass = 355, fail = 59; p = 0.012.

Figure 4B compares the number of students that passed or failed in each offering of the course. The performance of students in the 2017–18 offering (38 failures / 433 students) was significantly better than the 2018–19 offering after lecture recordings were made available (59 failures / 414 students) as determined by a Chi-squared test (X2 (1, n = 847) = 6.257, p = 0.012, Fig. 4B). Overall, this observation translates to approximately a 6% increase in failure rate for students that had LC made available to them.

5 Discussion

In the present study, we performed a well-controlled investigation into the impact LC availability has on academic performance in a large biomedical science course. In comparison to the 2017–18 cohort, all aspects of the course remained the same for the 2018–19 cohort with the availability of lecture recordings being the only difference.

We analyzed only identical questions offered to both cohorts to come up with the exam averages and found a decline in the class average on all four exams in the course following the availability of lecture recordings and three out of four exam declines observed were considered statistically significant after an unpaired t-test was performed. A calculation of the final grades for the course using only the exam marks indicated a statistically significant decline in final grade averages. While a decline in grades due to LC availability has not been frequently reported, three previous studies have noted a decline (Drouin, 2014; Edwards & Clinton, 2019; Johnston et al., 2013) but all three studies also reported a significant decline in attendance and attributed the decline in grades to attendance. Contrary to these studies, the class attendance observed in this study, through the analysis of student response system data, indicates that class attendance remained similar between the cohorts for all exams. Since the student response system was an online system, students can potentially answer questions from anywhere and need not be in class to respond, but questions were asked at random during lecture which reduced the likelihood of this occurring and because the same response system was used for both cohorts, the possibility of students answering questions from another location applies to both.

To rule out the possibility that the decline in final grades observed was due to an academically weaker cohort, we calculated the average GPA of each cohort using only the courses taken concurrently with PHSI 208.6. We did observe a non-significant reduction in GPA in the 2018–19 cohort, but it was not to the same degree as the decline in PHSI 208.6 final grade observed. Nonetheless, the trend for a lower GPA in the 2018–19 cohort may have contributed to the reduced final grades observed in PHSI 208.6

A statistically significant p-value obtained through the use of a t-test will tell us if an effect exists, but calculating effect size is important in understanding the size of the effect observed particularly if large sample sizes are used (Sullivan & Feinn, 2012). We calculated effect size (d = 0.193) and found that the decline in grades observed in PHSI 208.6 was considered a small and statistically significant effect by Cohen’s standards which was not surprising given that the decline observed was 3.01%. While the decline in final grades observed was modest, in certain contexts it can be very meaningful, for example the difference between a GPA of 80% and 77% can significantly hinder a student’s opportunities, reducing their chances of scholarships or entrance into professional programs. Also recent research into educational intervention and academic achievement suggests that the values created by Cohen may be too high and that smaller effect sizes by Cohen’s standards can be considered more impactful given context (Kraft, 2020).

Since the decline in grades observed after the introduction of lecture recordings was not due to changes in attendance or significant differences in academic abilities between the cohorts, the question remains as to why we observed such a decline. A prospective study that we aim to perform and that may shed light on our findings would be to create a comprehensive student survey focusing on in class behaviours when students are aware that lecture recordings are available, as well as behaviours while reviewing lecture recordings. It is possible that changes in attentiveness, note-taking and level of multi-tasking all occur in class when students are aware that lecture recordings are available to review at their convenience and while most students may intend to review lecture recordings, high levels of procrastination previously observed in students (Day et al., 2000), may lead to this not occurring.

The change in the distribution of grades observed after the introduction of lecture recordings is interesting. Previous work suggests low achievers are the greatest users of LC (Groen et al., 2016; Owston et al., 2011) but primarily for surface rather than deep learning and we found that the greatest change in distribution was in poor grade categories. Interestingly, we found that the number of excellent students remained constant between the two cohorts and a previous study has noted that LC appears to have less of an impact on high achieving students (Groen et al., 2016) which supports our findings.

A particularly important finding of the current study was the statistically significant difference in pass-failure outcomes after the introduction of lecture recordings. As mentioned previously, very few LC studies have found a decline in academic performance after the availability of LC and even fewer have focused on the number of course failures after the introduction. Johnston et al. (2013) was the only study reviewed that noted a difference in failure rate after LC availability, but the GPAs of the two cohorts were not controlled for and the study itself was not well controlled since it was performed at different campuses with different instructors. Given the significant decline in both academic performance and increased failure rate we can conclude that in the context of the current study LC had a negative impact on student success.

6 Conclusion

Although the results of this study imply that LC has a negative impact on student success, we would not urge instructors to immediately cease the use of LC in their courses. There have been many studies, highlighted in the literature review of this article, that have shown LC to have a positive impact on academic performance. Students generally have a positive perception of LC (Dommeyer, 2017; Gosper et al., 2010; Nashash and Gunn, 2013; Owston et al., 2011; Traphagan et al., 2010) and list a variety of benefits which were identified in a recent systematic review on the topic (Banerjee, 2021) including but not limited to facilitated learning and revision, a better work life balance, supplementation to lecture notes, overcoming language barriers and reducing anxiety.

We believe that this study serves as a cautionary tale into the use of LC and urge instructors to communicate the findings of this study and others to properly inform students of this technologies best use. Although many studies have reported LC to have a positive impact on academic performance, the results appear to be dependent on how the technology is used and many studies have implied that the positive results are restricted to students that use LC for supplementation, rather than replacement, of live lectures (Bos et al., 2016; Brooks et al., 2014; Dommett et al., 2019; Edwards & Clinton, 2019; Sarsfield & Conway, 2018).