Peer discussion on online class forums
There were many more post-test discussion threads after the (first) test when there were repeat tests, confirming the possibility of enhanced peer learning at play (Table 3). Pearson \(\chi ^2\) tests established that these increases were both highly statistically significant (p-value \(<0.001\)).
Table 3 Class forum discussion threads about tests in the two courses after the (first) test – 2017 had no repeat tests while 2018 did Statistical methodology to analyse test and exam performance
The binary outcomes (correct or incorrect) of the questions on the two midterm tests and final exam are not statistically independent due to multiple outcomes measured on each student and the repeated use of variants of each isomorphic question. Hence, a repeated measures form of contingency table analysis was performed by using the glmmTMB function within the R language to fit mixed-effects logistic regression models. These models included student and question as random effects.
It was felt that overall student ability may have an effect on the learning pathway, and so as a measure of overall ability the students in each class were evenly split into two ability groups depending on whether their final exam score was above (high ability) or below (low ability) the median final exam score.
The following questions were examined:
-
What effect does overall student ability have on the probability of answering a question correctly in Test 1? Does having previously seen the question in an assignment (course 2xx only) also have an effect?
-
Was there an improvement from Test 1 to Test 2? Moreover, does this effect depend on overall student ability, or whether the question was previously seen in an assignment (course 2xx only).
-
How did test performance influence final exam performance?
In what follows, the notation Ability, Asgmt, T1, and T2 is used to denote the variables corresponding to student ability (high or low), whether the question material was seen in an assignment prior to Test 1 (yes or no), Test 1 question outcome (correct or incorrect), and Test 2 question outcome (correct or incorrect).
Course 2xx results
Analysis of test 1 performance
Analysis of Test 1 results included the explanatory variables Ability and Asgmt. Over all students and questions, the proportion of Test 1 questions answered correctly was 0.520. Whether or not the question material was previously-seen in assignments made little difference, with success rates of 0.529 if previously seen, and 0.501 if not. This difference was not statistically significant (\(p>0.05\)). For students of high ability, the log-odds of answering a question correctly was 0.743 higher than for low ability students (\(p<0.001\)).
Analysis of test 2 performance
Comparison with Test 1: The overall success rates on the questions in Tests 1 and 2 were 0.520 and 0.775 respectively, and the improvement was highly significant (\(p<0.001\)).
Additional effects of ability and assignment: The full analysis of Test 2 results included the explanatory variables Ability, Asgmt and T1. Students with high ability had log-odds for a correct answer that was 0.643 higher than students of low ability (\(p<0.001\)), and this effect of Ability was independent of Asgmt and T1 (Table 4). The variables T1 and Asgmt were both highly significant, as was their interaction (\(p<0.01\)).
Answering a T1 question correctly increased the odds of answering the same T2 question correctly, and having previously seen the question material in an assignment also increased the odds. Furthermore, the interaction between Asgmt and T1 showed that there was an additional positive benefit from having both answered the question correctly in T1 and having previously seen the question material in an assignment (Table 4).
Table 4 Effect of Ability, Asgmt and T1 on the log-odds of correctly answering an isomorphic Test 2 question. The baseline is for a low-ability student who incorrectly answered the question in Test 1, for a question not previously seen in an assignment Analysis of exam performance
Two test questions were repeated in the exam, both on material previously seen in assignments prior to Test 1. These two questions had the highest score over all of the exam questions, with combined success rate of 0.941. Due to the limited data, the analyses were numerically unstable if multiple explanatory variables were used and so only T2 was used, and was statistically significant (\(p<0.001\)). Combined over the two questions, students who answered an isomorphic T2 question incorrectly had an 0.865 success rate, increasing to 0.964 for those who answered correctly in T2. This corresponds to an increase in log-odds of 1.54.
Course 3xx results
Analysis of test 1 performance
None of the questions in the tests were previously seen in assignment material, so Ability was the only explanatory variable. High ability students were found to have 0.678 higher log-odds of answering a Test 1 question correctly (\(p<0.001\)) compared to low ability students.
Analysis of test 2 performance
Comparison with Test 1: The overall success rates on the questions in Tests 1 and 2 were 0.665 and 0.879 respectively, and the difference was highly significant (\(p<0.001\)).
Additional effect of ability:
The full analysis of Test 2 results included explanatory variables Ability and T1. Students with high ability had 0.741 higher log-odds of answering correctly (\(p<0.001\)) compared to low ability students, and answering the Test 1 question correctly increased the log-odds of answering the Test 2 question correctly by 0.475 (\(p<0.001\)). There was no interaction between Ability and T1 (Table 5).
Table 5 Effect of Ability and T1 on the log-odds of correctly answering an isomorphic Test 2 question Analysis of exam performance
Three test questions were repeated in the exam, and performance on these three questions was significantly better than on the other questions (\(p<0.001\)), with a combined success rate of 0.932. As with the course 2xx analysis, it was only possible to use T2 as an explanatory variable, and it was statistically significant (\(p<0.001\)). Combined over the three questions, students who answered an isomorphic T2 question incorrectly had a 0.773 success rate, increasing to 0.946 for those who answered correctly in T2. This corresponds to an increase in log-odds of 1.63.
Comparison of grade distributions
Table 6 shows the raw (unscaled) grade distributions for the two courses in years 2017, where there was no repeat assessment, and 2018. The 2017 grade distributions for both courses were skewed towards lower grades, especially the 2xx course with a grade average below 3 (C+). Consequently, in 2017, the raw grades for both courses shown in Table 6 were manually scaled up to improve the distribution. The 2018 grade distributions were seen to be more bell-shaped, and with improvements in grade average of 0.92 and 0.42 in 2xx and 3xx, respectively. No adjustments were made to the 2018 raw grades.
Table 6 Raw grade distributions in the two courses – 2017 had no repeat assessments while 2018 did For each course, a Pearson’s \(\chi ^2\) test was performed to determine whether there was a significant difference between the 2017 and 2018 grade distributions. Pearson’s \(\chi ^2\) tests yielded a p-value of \(<0.001\) for course 2xx and a p-value of 0.007 for course 3xx, both providing highly significant evidence of a difference.