Introduction

Doctors’ clinical reasoning skills depend highly on a relevant knowledge base (van der Vleuten and Newble 1995). Becoming an excellent doctor starts at medical school. In order to promote excellence in medical teaching and learning, it is necessary to find out how teaching affects learning (Ramani 2006). One could wonder whether our medical students are being optimally stimulated. Is the active learning of students sufficient, or can they be stimulated to perform even better? For this purpose, objective information on learning efficacy is needed. Assessment of learning efficacy currently involves an integrated approach of formative and summative assessments, and regular evaluation of competences, that are recorded in a student portfolio (Driessen et al. 2005; Epstein 2007). Recently, the role of interim assessments as a third type of assessment in a comprehensive assessment system of US school districts was described, that: (1) evaluates students’ knowledge and skills relative to a specific set of academic goals, typically within a limited time frame; and (2) are designed to inform decisions at the classroom level and beyond (Perie et al. 2007). Interim assessments contain both formative and summative assessment features, but unlike true formative assessments, the results of interim assessments can be meaningfully aggregated and reported at a broader level. An interim assessment reflects the level of the students’ knowledge and skills, but unlike summative assessments, does not have strict consequences, i.e. pass or fail the assessment. Perie et al. see three different general classes of purposes for interim assessments: instructional; evaluative; and predictive (Perie et al. 2007). All three assessment purposes potentially provide useful information for both students and faculty, and they may also allow further scientific elaboration.

An important goal of assessment is to optimize the capabilities of all learners and practitioners by providing motivation and direction for future learning (Epstein 2007). Assessment also drives students’ learning behaviour (Cohen-Schotanus 1999; Frederiksen 1984; van der Vleuten and Schuwirth 2005). Assessment and learning are related to varying degrees, although the specific dynamics are not yet fully understood (Boulet 2008; Handfield-Jones et al. 2002). Apart from obtaining useful information from assessments, it is supposed that assessing drives, and may help learning, the so-called “testing effect” (Newble and Jaeger 1983). Karpicke and Roediger elegantly demonstrated the critical importance of retrieval practice in consolidating learning a foreign language by university students using repeated testing (Karpicke and Roediger 2008). A similar effect was demonstrated by the same authors in two experiments giving students one or three immediate recall tests without feedback (Roediger and Karpicke 2006b). A positive effect of test-driven learning was recently demonstrated in a didactic conference for paediatric and emergency medical residents (Larsen et al. 2009). Thus, assessment can be viewed as an educational tool that provides useful information for both students and faculty (Krupat and Dienstag 2009).

Until now, according to Norman et al. positive effects of assessment at a medical curriculum level have not been demonstrated (Norman et al. 2010). Does interim assessment also improve student performance in a non-laboratory undergraduate medical education setting? If so, we hypothesized that interim testing of the medical students results in a higher formal examination score. Here the interim assessment is used as a didactic instrument. Medical education uses a variety of settings and formats. Identification of which educational setting lends itself to test-enhanced learning is to be investigated (Larsen et al. 2008). We assumed that the best learning environment to administer the interim assessment is a small group work session, as it is considered to substantially contribute to meaningful learning (Michael 2006). Furthermore, we were interested if we could demonstrate an additional value of two interim assessments instead of one assessment. For this purpose, we set up a prospective randomized study comparing two different arms of small groups. In the intervention arm (I) an interim assessment was provided prior to the formal course examination, in the control arm (C) no interim assessment was provided. The intervention arm was further subdivided into two arms: one arm with one interim assessment (I-1) and the other arm with two interim assessments (I-2). The current study shows that an interim assessment in a randomized study setting is found to stimulate students to increase their formal examination score.

Methods

Participants and setting

The study was conducted with biomedical students at the Radboud University Nijmegen Medical Centre, consisting of 326 medical and 91 biomedical science students, who undertook a second-year Bachelor course on General Pathology. The female to male ratio of students was 3:1. The Radboud University Nijmegen Medical Centre provides a learner outcome-oriented curriculum in which each course consists of 4 weeks. The subsequent topics of the course on General Pathology were: (1) Principles of diagnosis and cellular damage; (2) Inflammation and repair; (3) Circulatory disorders; and (4) Tumour pathology (pathogenesis and progression). Each topic had a consistent sequence of educational activities: lecture; task-driven directed self-study in preparation for the subsequent small group work; small group work (obligatory); practical course (obligatory); interactive lecture; and non-directed self-study (see Fig. 1). The formal examination of all topics took place on the final day of the course. For the interim assessment, the small group work session on tumour pathology: “The pathogenesis of uterine cervical carcinoma” was selected.

Fig. 1
figure 1

Topic structure. Time of administration of a single interim assessment (study arm I-1) and double interim assessments (study arm I-2) in relation to topic structure. The time scheduled is indicated between brackets for each educational component

Ethical considerations

Formal written permission to execute the study was obtained from the course coordinator. As there is no access to a formal ethical approval process for medical education research in the Netherlands, information about the treatment of the students is provided. This concerns the possible risks for the students, the equitability of the selection, the guarantee of privacy and confidentiality, the procedure on informed consent, and the possible safeguards to protect vulnerable populations (Eva 2009; Kanter 2009). In our opinion, participation in the interim assessments bore no possible risk to students. The assignment of the students to the small groups and the assignment of the groups to one of the three arms of the study was random. The privacy of the students was guarded by the study coordinator. For the study, the examination scores were linked to a student number and the identity of the students was not disclosed. The students were adequately informed of the purpose of the interim assessment and consent was obtained. We were not aware of any vulnerable population among the students that would have required safeguards. When developing the current study, the ethical principles of the World Medical Association Declaration of Helsinki were taken into account.

Intervention

An interim assessment consisted of seven multiple-choice questions with a maximum of four alternative answers on the topic of tumour pathology. A time of 10 min was allotted to each interim assessment. The questions were derived from a bank of 80 multiple-choice questions on tumour pathology formulated by one of the authors (DR), who is an expert in tumour pathology, and were validated by two independent pathologists, two independent medical educationalists, and a master medical student (MOB).

The formal examination consisted of 15 multiple-choice questions and one open question relating to tumour pathology and seven open questions on the other topics. Both the multiple-choice questions of the interim assessments and the formal examination were derived from the aforementioned bank of questions. The two interim assessments and the formal examination were composed of different multiple-choice questions, but the content and the level of the questions were similar.

Randomization

Participants were randomized in three arms of equal numbers of small work groups. Allocation of intervention occurred on the level of the small work groups. The randomization was stratified for gender and study discipline, since these may influence learning behaviour and learning efficacy (Kusurkar et al. 2009). In arm I-1, students underwent an interim assessment once, i.e. at the end of the small group work session; in arm I-2, students underwent an interim assessment twice, i.e. at the beginning and at the end of the small group work session; and in arm C, students did not undergo an interim assessment (see Fig. 2).

Fig. 2
figure 2

Flow chart. Study design including two intervention groups (I-1 and I-2) and one control group (C). *Number of students excluded, because they did not participate in the formal examination (n = 13)

Procedure

Students in the intervention arm were informed about the interim assessment at the small group work session. Tutors explained to the students immediately before the interim assessment that it was an investigation to inform the faculty on the learning outcome of the students during the small group work. Participation in the interim assessment was on a voluntary basis, and students could stop taking the assessment at any time. They were assured that the result of the interim assessment would not be taken into account for determining the score of the formal course examination. The participation rate was 100%. Students and tutors were not informed of the content of the questions of the interim assessment. The tutors were present at the beginning of the small group work session including the interim assessment, and during the second hour of the small group work session including the other interim assessment. Five different tutors guided the small group work sessions. Each tutor guided both intervention and control groups. No explicit feedback on the results was given to the students. The formal examination took place 3 days following the interim assessments.

Outcome measures

The main outcome measures were overall score of the formal examination, and the subscore of the open and multiple-choice questions on tumour pathology. Both outcome measures were presented on a scale from 1 to a maximum of 10 points. A subgroup analysis of gender and discipline was performed. The interim assessment is intended as a didactic instrument, not a predictive instrument, therefore the scores of the interim assessment were not compared to that of the formal examination.

Statistical analysis

Linear mixed models were used in order to account for the dependence caused by clustering of the students into small groups. The small group was used as a random factor. Analysis was performed according to the intention-to-treat principle. After the primary analysis, a subgroup analysis was performed according to gender and discipline.

Results

Main results

Students who underwent an interim assessment once or twice (arms I-1 and I-2, respectively) showed a 0.29 point (scale 1–10) higher overall score on the formal examination than the control group C (p = 0.037). For the questions in the formal examination related to the topic of tumour pathology, the score amounted to 0.47 points higher (p = 0.007), whereas it was 0.17 points higher for the questions of the other topics on general pathology. Accompanying effect scores and standard deviations are reported in Table 1. Results of the mixed model analysis are reported in Table 2. No differences in formal examination score were found between arms I-1 and I-2 (Table 3).

Table 1 Outcome measures (scale 1–10) including standard deviations and effect sizes
Table 2 Results of the mixed model analysis
Table 3 Results formal examination per intervention arm

No student refused to participate. Students who undertook the interim assessment, but did not undertake the examination, were excluded (n = 13). A total of 404 students were included in the analysis. There was no significant difference in dropouts between the three study arms.

Subgroup analysis

Female students scored significantly higher on the formal examination compared with the male students (0.65 points, p < 0.001). Medical students scored 0.65 points higher than biomedical science students (p < 0.001). There was no difference in progress imposed by the interim assessment between these subgroups.

Discussion

Main findings

An interim assessment during a small group work session in a randomized controlled trial setting was able to increase students’ formal examination score. This effect was similar for the students who took the interim assessment either once or twice. The increase in the score amounted to almost 0.5 points on a scale of 1–10 for those questions in the formal examination that were related to the questions in the interim assessment. There was no difference in progress imposed by the interim assessment between gender or discipline.

Strengths

The study design, a prospective randomized controlled trial with stratification for gender and discipline can be considered to be robust, because selection bias, information bias and confounding bias are highly unlikely. The primary outcome of the study, i.e. the score of the formal examination, is unequivocal. The data were subjected to a linear mixed-model analysis in order to account for the dependence caused by clustering of the students in small work groups. The multiple-choice questions in the interim assessment and formal examination were validated both on medical content and educational quality. Based on these considerations, the results appear consistent.

The control group was not engaged in an alternative interim assessment, as this would distract from the small group work. The students in the control group could spend time discussing the topic of the small workgroup, when the intervention groups received the interim assessment. Therefore, total exposure time to the subject matter was equal for the intervention and the control groups.

The study setting was directly related to educational practice, i.e. during an ongoing regular biomedical Bachelor course and it did not interfere with educational activities. The tutors were blinded to the content of the interim assessment. All tutors guided at least one student group from each of the three study arms. Both students and tutors accepted the interim assessment well and perceived it as a natural component of the small group work session. Based on regular evaluations, the course on General Pathology is highly appreciated by the students and the faculty, and can be considered to use current best practice. We therefore feel that the study is representative of current best educational practice.

Limitations

The generalizability of our findings is currently limited. This study presents only a single study in a single curriculum. To increase the level of evidence and to investigate a broader application of the interim assessment, more similar studies are needed.

We were not able to demonstrate an additional learning effect of a second interim assessment in the current study. This might be caused by the length of the interval between de two interim assessments, as will be discussed later.

If our results, that participation in an interim assessment prior to a formal examination increases the score of the formal examination, are confirmed by other studies, this would mean that the students in the interim assessment arms were at an advantage over the students in the control group. Therefore, in future studies, the control group should also be subject to an interim assessment, using cross-over study designs, for example.

Thirteen students (3.1%) could not be included in our analysis, because they did not take part in the formal examination. Among the dropouts the male: female ratio was 5:8 (overall ratio: 1:2), the biomedical: medical ratio was 4:9 (overall ratio: 1:4). The dropouts were distributed equally over the three study arms; therefore it is unlikely this will have affected our results.

Interpretation of the main findings

As the students were not aware of our study hypothesis, i.e. that participating in an interim assessment would lead to a higher formal examination score, we assume that they were stimulated or even challenged by the interim assessment, as such. By doing so, they probably were engaged in retrieval practice in consolidating learning as a manifestation of the testing effect (Karpicke and Roediger 2008). The underlying mechanisms of this effect may include: (1) enhanced motivation of the learners; (2) directing them to focus on relevant issues; and (3) giving them an opportunity to train for the formal course examination (Larsen et al. 2008). Although the positive effect on the formal examination was relatively small, we feel that it has educational relevance because it could have had a clear influence on the summative exam, i.e. pass or fail. In addition, it demonstrates that students in an ongoing curriculum (i.e. a realistic setting) can be stimulated by an interim assessment to perform better.

The fact that the positive effect on the formal examination score was not different using either one or two interim assessments indicates that a second interim assessment taken within a short time interval (i.e. less than 2 h) following the first interim assessment has no added value on the learning effect. Therefore, it is likely that such an additional effect requires a longer timeframe in between assessments. Karpicke and Roediger demonstrated increased benefits of repeated testing when tests are distributed over time (Karpicke and Roediger 2007). Another factor may be feedback, as it seems a prerequisite for the added value of multiple assessments (Larsen et al. 2008), as will be discussed later.

Comparison with other studies

An interim assessment is a relatively new educational tool that has recently been developed in the context of secondary schools in the USA (Perie et al. 2007). Repeated testing during a course, that leads to better retention of information, could be considered as a series of interim assessments. Poljicanin et al. demonstrated a positive effect of daily mini quizzes on students’ performance in an anatomy course (Poljicanin et al. 2009). They conducted a total of 34 quizzes during a whole academic year; whereas in our study, we provided only one or two assessments in a 4-week course. It is to be investigated how many assessments per timeframe would gain an optimal increase in performance, without interfering with the regular course programme. Karpicke and Roediger demonstrated that repeated testing leads to better long-term recall in comparison with single testing (Karpicke and Roediger 2008). In the current study, we were not able to demonstrate this result, as there was no significant difference between the intervention groups taking one or two interim assessments. As stated before, this can be explained by the fact that both tests were applied in the same small group work session, with only 2 h in between. It would be interesting to investigate whether the timing of the interim assessment, i.e. either at the beginning or at the end of the small group, would matter in this respect.

Larsen and colleagues recently described improvement of long-term retention by medical residents following repeated testing in a real-life educational setting (Larsen et al. 2009). In contrast to our study, the testing was followed by feedback, and the findings were measured at a final recall interval of 6 months. Our findings suggest that even without such feedback, retention of information, as measured by the formal examination score, occurs. It is conceivable that the increase of the score might have been higher if we would have given feedback as indicated by the literature (Larsen et al. 2008; Roediger and Karpicke 2006a; Wood 2009). For the sake of clarity of the study design, we chose not to include explicit feedback in this study, but we have included it in a follow-up study using a cross-over design. In this new study, we have carefully considered the nature, source and timing of feedback, as suggested by Veloski et al. (2006).

Conclusions

An interim assessment during a small group work session is found to stimulate students to learn better and to increase their score of the formal examination. The current study supports the efficacy of the testing effect in an ongoing medical curriculum and the view that assessment can be seen as an educational tool (Krupat and Dienstag 2009). An interim assessment may enrich the repertoire of formats of small group work as suggested by Michael, in order to further increase meaningful learning (Michael 2006). It also implies that in our current educational best practice, students still can be challenged to promote excellence in medical education. Further randomized controlled studies assessing the frequency of testing and the addition of feedback are needed to optimize the test-enhanced increase in student performance in a realistic educational setting.