This quasi-experimental study examined the effectiveness of RP on student learning, and in turn, their performance, on the final exam in a large, undergraduate Gross Anatomy course at Queen’s University, Kingston, ON, Canada. The course ran throughout the semester for 12 weeks, with three lectures and a 2-h lab in the Anatomy Learning Centre per week. The course discussed the gross anatomical structures in the head, neck, and thorax, including detailed descriptions of the structures’ neurovasculature. The instructor typically employs instructional practices such as the use of the course textbook and the document camera to teach this course, where they highlight important textual information and label figures and images, as well as present anatomical specimens and models obtained from the Anatomy Learning Centre. In past years, when RP was not practiced in this class, the student average was between 65 and 70%. This study was approved by the Health Sciences and Affiliated Teaching Hospitals Research Ethics Board (HSREB) at Queen’s University.
Participants
All participants (N = 248) in this study were full-time students who were enrolled in a large, undergraduate Gross Anatomy course. For most students, this course was a requirement for credits towards their respective degrees, and therefore, most of them regularly attended all lectures. These students were enrolled in one of three programs: third-year Life Sciences (LISC) (N = 60), second-year Kinesiology (KIN) (N = 121), and third-year Physical Education (PHED) (N = 43). All other students, collectively classified as ‘Others’ (N = 24), enrolled in this course as an elective. Participation in this research was completely voluntary and participants had to provide informed consent.
Procedure
All enrolled students participated in this study. Participation was completed on TopHat©, an online educational, non-analytical software platform that students were able to download as an app on their electronic devices, or access online via https://tophat.com/. The use of TopHat©, unlike other methods such as the use of clickers, allowed for the application of delayed retrieval—a fundamental strategy of RP which makes retrieval effortful [23]. Student usage data were collected from TopHat© servers and de-identified. All analysis was done using these de-identified data sets. Students were given grades for their participation, up to a maximum of 5% of their overall course grade.
RP was implemented during the first 10 min of each lecture, whereby 6 to 10 questions were posed. Equal interval spacing was used during this study, whereby students were stimulated to practice retrieval of semantic information, comprehension, and application in equal temporal intervals. This strategy worked well in this course as the instructor had employed the Spiral Syllabus [24], where new information would build upon previous information in a logical, sequential manner. The most important concept areas in this course, and therefore the most repetitive RP questions, were concerned with the neurovasculature of the respective gross anatomical structures. Feedback in the form of correct answers was given to students 1 day after each RP session. The final exam was written approximately 3 weeks after the course ended.
All the questions posed during the course, in both RP and on the final exam, were multiple-choice questions (recognition tests). A total of 252 RP questions were posed during the semester and a total of 148 questions were posed on the final exam. RP was used to study student knowledge of the innervation of the visceral systems; thus, of all the questions that were asked on the final exam, only the questions that addressed innervation were congruent with the questions posed during RP. These questions, which amounted to a total of 67 final exam questions, were designated as completely congruent questions (CCQs).
‘Congruency’ refers to the degree that a RP question is similar to a question that was posed on the final exam. This does not imply that the same question was asked during both the implementation of RP and on the final exam; rather, two questions were considered congruent if the same knowledge that pertains to one also pertains to the other. For example, “Which of the following nerves innervates the masseter muscle?” was asked as an RP question; a CCQ congruent with this question was “Which cranial nerve, if damaged, causes inability to chew?”
All other questions, which did not address innervation and were posed during RP but not on the final exam, were designated as noncongruent questions (NCQs). An example of an NCQ was “Which of the following is not a bone of the face?” Of course, students were not made aware of these categorizations and were expected to study and learn all the information that was taught during the course.
Data Analysis
The students’ overall levels of participation in RP were obtained from the tabulated statistics on TopHat©. Furthermore, their grades for each of the CCQs were obtained from TopHat© and their final exam answer sheets. All data were tabulated, and data analysis was performed according to demographics. Students were categorized into one of two groups based on the level of their participation. Participation was characterized as the total number of RP questions the students answered during the semester, regardless of the correctness of those answers. The dichotomous cut-off for participation was set at 85% due to a natural gap in the data (see Table 1), as determined by a consulting statistician in the faculty. Therefore, students who participated ≥ 85% were classified as the high RP group, and those who participated < 85% were classified as the low RP group. The participants in both groups were similar in terms of intellectual ability, as measured by their GPA prior to the commencement of the course.
Table 1 Student participation in retrieval practice (RP), according to demographics The Statistical Package for the Social Sciences (SPSS Statistics) was used to analyze the data. Statistical significance was set at p < 0.05. In all statistical analyses, the dependent variable was always student performance on the final exam; the independent variable was student participation on TopHat©. A Shapiro-Wilk’s test was conducted to test for normality of both dependent and independent variables. This indicated that the data were non-normally distributed. Kendall’s tau-b and Spearman’s rank-order correlations, as well as linear regression, were performed to determine the existence (or lack thereof) and strength of the relationship between the dependent and independent variables.