Introduction

To date, scholars are divided on which examination type, in-class examination (ICE) or take-home (THE) examination, is more conducive to positive student outcomes (Bengtsson, 2019; Durning et al., 2016). THE examinations are typically open book examinations that are taken at the location of the student’s choice, whereas ICE examinations are (mostly) closed book examinations which are completed on site. A systematic review did not find a clear cut advantage to ICE or THE, but rather suggested that each form had unique pedagogical purposes and benefits (Bengtsson, 2019). THEs are expected to fit well with the assessment of higher order cognitive skills on the Bloom’s taxonomy scale, such as analyzing and evaluating (Krathwohl, 2002). Some argue that because students know they have access to the materials during the examination, they experience less pressure to focus on memorizing and will have more resources for deeper engagement with the material (Zoller & Ben-Chaim, 1989). However, the review also showed that ICEs are ideal for testing lower levels of Bloom’s taxonomy scale (e.g., memorizing and applying) and are better at safeguarding against student cheating than THEs (Bengtsson, 2019). The literature comparing the impact of these two examination forms on academic outcomes and student wellbeing however has yielded mixed results (Bengtsson, 2019; Durning et al., 2016). Some find that ICEs are linked to higher examination grades (Agarwal & Roediger, 2011; Moore & Jensen, 2007; Spiegel & Nivette, 2023), while other studies suggest an advantage for (versions of) THE for examination grades (Gharib et al., 2012; Wachsman, 2002) and knowledge retention (Johanns et al., 2017). Other studies find no statistical difference in course grades (Duncan, 2007; Michael & Custer, 2018) or knowledge retention (Agarwal et al., 2008; Gharib et al., 2012).

The current study addresses three limitations across the existing evidence base. First, relatively few studies compared long-term knowledge retention between ICE and THE. The studies that do explore knowledge retention tend to test retention a few days or a few weeks after the examination took place (Haynie, 2003; Moore & Jensen, 2007; Nsor-Ambala, 2020; Rich, 2011), usually in the form of a pop quiz or a final course ICE (Agarwal, 2009; Nsor-Ambala, 2020). Some scholars have argued that a longer delayed retrieval test is crucial for assessing long term retention (Rummer et al., 2019). Longer time lags are commonly utilized in studies exploring knowledge retention differences in relation to other pedagogical tools such as teaching and learning styles (Herzig et al., 2003; Taglieri et al., 2017). Only two studies known to us have explored the relationship between examination form and long-term knowledge retention using a lag exceeding eight weeks (Rummer et al., 2019; Spiegel & Nivette, 2023).

Second, while variations of THE (such as cheat sheets) have been suggested to reduce anxiety and stress (Nsor-Ambala, 2020; Zoller & Ben-Chaim, 1989), this result seems to be highly dependent on the examination context (Nsor-Ambala, 2020). Concerns such as the location of the examination and the duration of the exam may impact how the examination is related to student wellbeing (Bengtsson, 2019; Durning et al., 2016). The current study examines the relationship between examination form and wellbeing taken at home under highly time-restricted examination conditions. Time restriction conditions seem to be important to reduce student cheating and to enhancing student preparation. However, it is not clear whether these conditions may impede the wellbeing advantage of THEs suggested in the literature.

Third, studies in recent years have had to account for the impact that the COVID-19 pandemic has had on society and on learning in higher education in particular. Studying remotely, including remote testing, became the norm. Studies comparing student outcomes during this period were likely impacted by these distressing conditions. Research has shown that students’ wellbeing was severely negatively impacted by the COVID-19 pandemic and the lockdown measures (Barbour & van Meggelen, 2023; Wang et al., 2020). Due to the distressing circumstances, it is difficult to assess whether wellbeing outcome differences found in studies comparing pre-COVID-19 ICE cohorts with COVID-19 THE cohorts can be attributed to the examination, the pandemic, or a combination of the two.

The current study aims investigate to what extent examination form, ICE and THE, are associated with student academic performance outcomes and student wellbeing. In this study, the assumption is that the alignment between course goals, activities and assessment play a relatively larger role in shaping this relationship than contextual differences, for example higher education settings and cultural contexts. Existing systematic reviews do not tend to attribute discrepancies in outcomes to variation in study contexts (Bengtsson, 2019; Johanns et al., 2017). However, it is important to note that this does not rule out the potential impact of institutional context on the study findings. This study follows a bachelor and a master course situated within a Dutch university over 4 academic years. The bachelor course was run twice with an ICE and twice with a THE, while the master course was run once with an ICE and three times with a THE. These cohorts span across the COVID-19 pandemic, from pre-pandemic to those emerging out of the pandemic. This set up allows us to compare the two examination forms over multiple cohorts at different academic levels and COVID-19 contexts. Both short-term outcomes (i.e., examination grade) and long-term outcomes (i.e., knowledge retention test 4–6 months post course) are measured. This design allows us to at least partially disentangle the impact of examination forms from the restrictions imposed by the pandemic.

Literature review

Short- and long-term academic performance

Scholars have mixed views regarding the relationship between examination form and the relative impact on student academic performance. Scholars supporting the use of ICEs argue that when students are aware that they will have access to the examination materials they will less frequently attend the lectures and are more likely to avoid or reduce their study behaviors in preparation for the examination (Moore & Jensen, 2007). ICEs thus require students to prepare more rigorously and more deeply prior to the examination, which should in turn result in higher examination grades. Studies show that students often do not know what to expect of a THE (Er et al., 2023) and therefore may perform worse on such an exam. Students who are told to expect a THE have been shown to underprepare and underperform when given an ICE instead (Agarwal & Roediger, 2011). Within this line of reasoning, a student taking an ICE is likely to prepare more thoroughly for their examination and in return is expected to perform better on a knowledge retention test in comparison to a student who has taken a THE (Moore & Jensen, 2007).

Proponents of THEs argue that it is conducive to higher academic performance than ICEs, especially for examination grades (Broyles et al., 2005; Gharib et al., 2012). First, THE proponents suggest that relieving the pressure of memorizing the materials by providing access to the materials during the examination reduces student anxiety (Gharib et al., 2012). Reduced anxiety in turn is expected to yield higher test performance (Vitasari et al., 2010). Additionally, when students do not have to memorize the materials, they are expected to have more time to dedicate to a deeper engagement with the materials and cast a wider net by studying additional sources (Michael & Custer, 2018; Theophilides & Koutselini, 2000). This is especially true when the examination tests higher order cognitive skills and uses students’ problem solving skills and when students are given sufficient time to engage with the materials during the THE. Time-restricted THEs have been shown to be especially challenging for weaker students who do not have enough time to obtain the answers within the allocated time (Boniface, 1985). As such, the expectation is that students who take a THE will have engaged more deeply with the materials and will therefore perform better and retain more knowledge after the course is completed. Interestingly, both arguments favoring ICEs and THEs agree that the source of better academic performance has to do with deeper engagement with the course materials in preparation for the examination (Nsor-Ambala, 2020). As such, the format of the THE can influence to what extent students engage in deeper learning. For instance, a group THE has been linked to more free riding behavior (Hall & Buzwell, 2013). As mentioned above, a relatively short examination duration can inhibit the ability of the student to interact deeply with the materials during the examination. However, longer duration THE can increase the likelihood of cheating (Henderson et al., 2022).

Student wellbeing

Studies exploring how examination forms relate to student wellbeing assess a variety of outcomes, such as anxiety, stress, satisfaction with the examination and happiness (Durning et al., 2016). Students highlight assessment as an important barrier to their wellbeing and report that assessment is one of the key areas in which they experience insufficient support for their mental health from staff (Lister et al., 2023). In general, students will experience more anxiety and threat to their self-concept when they perceive the test to be of high stakes irrespective of the examination form (Jones et al., 2021). When it comes to the relationship between examination form and student wellbeing, there is no clear advantage to either ICE or THE, and various student and contextual characteristics may impact the relationship. For instance, students who already report experiencing lower wellbeing prior to examination may report a higher impact of assessment on their wellbeing (Lister et al., 2023).

Research suggests that THEs allow students to consult the materials as needed from the comfort of their homes, which has been linked to lower anxiety levels compared to preparing for an ICE (Zoller & Ben-Chaim, 1989). Additionally, some scholars suggest that traditional testing forms, such as a time restricted ICE, do not relate sufficiently to the labor market experiences students will have after graduation and that such an artificial working environment may lead to a heightened experience of pressure by students (Jones et al., 2021). A qualitative study has shown that student and staff perceived open book examination to be more appropriate for the digital age, where the focus lies on finding information rather than retaining it (Jones et al., 2021). Some studies have found that THEs are associated with reduced testing anxiety (Akulwar-Tajane et al., 2021; Gharib et al., 2012; Tao & Li, 2012; Weber et al., 1983), enhanced satisfaction with the examination (Er et al., 2023), and a more positive learning experience (Şenel & Şenel, 2021; Tao & Li, 2012).

Research also shows that ICEs may be linked to more positive wellbeing outcomes as compared to THEs. The underlying reasoning is that students express that they do not know what to expect from an open book examination and may feel even more anxious if they have had little experience with this examination form (Er et al., 2023; Jones et al., 2021). Students tend to expect that THEs will be more difficult and cost more effort, which in turn can also translate into negative wellbeing outcomes (Slack & Priestley, 2023). While a review by Bengtsson (2019) concluded that THEs are more favorable for student wellbeing, another recent review suggests that the reported anxiety levels of students taking THEs are not lower than those taking ICEs, and therefore the advantages associated with THEs may be overstated (Durning et al., 2016).

The potential role of the COVID-19 pandemic

As a result of the pandemic, many learning activities took place online. In many institutional settings, this meant a shift from an ICE to a THE format. Studies exploring the impact of these learning and testing conditions suggest that, under these circumstances, THEs can have an adverse impact on student outcomes. A study in a business school found a weaker association between students’ academic performance in their first year (ICE) and their second year (THE) (Opstad & Pettersen, 2022), suggesting that stronger students did not perform as well on a THE as expected. A study conducted in a medical school comparing a pre COVID-19 cohort who took an ICE to a COVID-19 cohort who took a THE found no differences in the grade average or distribution between these cohorts (Alegre-Martínez et al., 2023). Other evidence suggests exam scores on open book examinations were somewhat lower in comparison to closed book examination during the pandemic (Hong et al., 2023).

Some scholars found that online learning and THEs implemented during the pandemic were paired with a higher sense of flexibility (Slack & Priestley, 2023) and that students were generally speaking more satisfied with these examination forms (Er et al., 2023). However, at the same time, students perceived an increase of effort in comparison to traditional study methods (Slack & Priestley, 2023). There is evidence to suggest that the anxiety levels before an examination were similar between the two examination forms, meaning that THEs during the pandemic did not necessarily provide students with emotional relief (Hong et al., 2023). It is possible that anxiety did not decline because students did not sufficiently know what to expect out of THEs (Er et al., 2023).

Overall, results of studies conducted during the pandemic do not seem to yield a very different picture to studies comparing the two examination forms prior to the pandemic. However, results regarding the relationship between examination form and student wellbeing when measured before and during the pandemic should be interpreted with caution. Any differences (or lack of differences) may be linked to either the onset of the pandemic, the change in examination form or a combination of the two. While it is not possible to fully disentangle the two conditions, the current study captures changes in examination form across multiple cohorts reflecting different stages of the COVID-19 pandemic: pre-pandemic (2019–2020), the onset of the pandemic (2020–2021), continuing pandemic and partial remote learning (2021–2022), and emerging from the pandemic with no restrictions (2022–2023).

Methods

The data for this study come from four waves of surveys conducted among students following a bachelor or master course within a social science program at a university in the Netherlands (see also Spiegel & Nivette, 2023). The bachelor course was an upper-level course open to students in the social science faculty, as well as international exchange students. The master course was an elective situated in a 1-year master program that was open to students from other master programs within the social science faculty. On a scale from 0 to 10, students must have received at least a 5.5 to pass. If students did not pass the examination in the first instance, they may have been eligible to participate in a retake examination based on pre-specified criteria (e.g., participation). The final overall grade in each course consisted of the average between the examination grade and other graded activities (e.g., paper and/or presentations). Students were not permitted to compensate their grade between the assessment forms, meaning they had to receive a passing grade for the examination and the other assessments in order to pass the course.

The four cohorts spanned from the academic year 2019–2020 to 2022–2023. In the first cohort, both courses provided an ICE. In the bachelor course, students completed a THE in the following two cohorts (2020–2021 and 2021–2022), but completed an ICE in the final cohort (2022–2023). In the master course, students completed a THE in the remaining three cohorts (2020–2021, 2021–2022, and 2022–2023). The details of each examination form and cohort context are summarized in Table 1. While the form of the examinations changed over time, the content of the course and materials remained relatively the same across cohorts.

Table 1 Overview of examination forms across four cohorts for the bachelor and master course

In each wave of data collection, all students who passed the courses were approached in April or May of the same academic year, which was about 4 months (master) or 6 months (bachelor) following the completion of each course, respectively. The survey consisted of two parts. In the first part, students were asked about their experiences during the course, including their development of academic skills, their perceived workload, and wellbeing during the course. In the second part, students were asked to complete a 10-item multiple choice test that covered the lecture and reading materials from their respective examinations. In both courses, the knowledge test questions remained relatively similar across cohorts. While some changes were made to questions where content or reading materials were updated over time, the majority of the content remained the same.

In the bachelor course, the response rate for those who completed the survey was 37.7% (n = 50 out of 148) for the 2019–2020 cohort, in 2020–2021, the response rate was 35.3% (n = 48 out of 138), in 2021–2022, the response rate was 26.5% (n = 39 out of 147), and in the 2022–2023 cohort, the response rate was 26.5% (n = 41 out of 155). In the master course, the response rates for those who completed the survey were 58.6% (n = 34 out of 58) in 2019–2020, 57.7% (n = 30 out of 55) in 2020–2021, 61.1% (n = 33 out of 54) in 2021–2022, and 45.8% (n = 22 out of 48) in 2022–2023. Students who completed the survey tended to report higher examination grades compared to the overall average within the course for that cohort year (see Table 7 in the Appendix). For each cohort and course, students were offered the opportunity to participate in a raffle for the chance to win one of several vouchers worth 25 euros.

Measures

In order to capture different dimensions of academic performance and achievement, several measures of academic outcomes were included. These include short-term (self-reported grades) and long-term (knowledge retention test) knowledge of the materials, as well as the development of skills. The development of academic skills, including problem-solving, teamwork, and writing, was a key learning goal within both courses. In order to capture different dimensions of students’ wellbeing during the course, two measures of self-reported wellbeing were included. The first was an overall wellbeing scale measuring a student’s general mental health during the course, and the second measured feelings of stress specifically attributable to the course (e.g., workload). In this way it was possible to capture broader mental wellbeing, which might stem from multiple sources, as well as wellbeing related to the course structures and materials themselves.

Academic outcomes

Knowledge retention

For each course, students were asked to complete a 10-item multiple-choice test covering the readings and lecture materials. Specifically, the tests were designed by the course coordinators to assess students’ knowledge on key concepts, theories, and findings learned during the course. Students were encouraged not to look back at their course materials in order to assess how much they still remembered from the course. Correct answers were summed to create a score ranging from 0 to 10 and thus serve as an observed score similar to a grade received on an examination.

Self-reported grades

Students were asked to report their highest examination and overall course grade. The examination grade included the retake examination if applicable. The overall course grade was a weighted combination of the examination and any additional assessments in the course.

Skill development

Skill development was measured using the Generic Skills Scale (GSS, Byrne & Flood, 2003) which contained six items such as “the course developed my problem-solving skills” and “the course helped me develop my ability to work as a team member.” Responses were measured on a 5-point Likert-type scale ranging from “strongly disagree” to “strongly agree.” Cronbach’s alphas were α2019 = 0.74, α2020 = 0.78, α2021 = 0.70, and α2022 = 0.69 for the bachelor course and α2019 = 0.69, α2020 = 0.67, α2021 = 0.63, and α2022 = 0.73 for the master course across cohorts, respectively.

Wellbeing outcomes

Student wellbeing

Wellbeing was measured using the World Health Organizations’ five-item wellbeing index (WHO [Five] Wellbeing Index, World Health Organization, 1998). For example, students were asked to what extent they felt “cheerful and in good spirits” and “I felt calm and relaxed” on a six-point scale ranging from “all of the time” to “at no time.” Higher scores therefore reflect lower wellbeing during the course. Cronbach’s alphas for the bachelor course were α2019 = 0.83, α2020 = 0.86, α2021 = 0.78, and α2022 = 0.86 and α2019 = 0.82, α2020 = 0.82, α2021 = 0.78, and α2022 = 0.89 for the master course.

Study workload stress

We used 8 items adapted from the Job Stress Scale (Shukla & Srivastava, 2016) to measure perceived workload stress during the course. Students were asked to what extent they agreed with statements such as “I had a high study load and feared I had very little time to do it” and “I felt that I never took time off.” Responses were measured on a five-point Likert-type scale ranging from “strongly disagree” to “strongly agree.” Cronbach’s alphas were reliable across cohorts and courses: α2019 = 0.82, α2020 = 0.90, α2021 = 0.88, and α2022 = 0.89 for the bachelor and α2019 = 0.91, α2020 = 0.89, α2021 = 0.84, and α2022 = 0.86 for the master.

Demographic characteristics

Two additional demographic variables were included: sex at birth (0 = male, 1 = female) and whether or not the student was a Dutch (coded 0) or international student (coded 1). Sex was included because previous research has shown that there are gender differences in student wellbeing and experiences of stress (Gestsdottir et al., 2021; Mikolajczyk et al., 2008; Van de Velde et al., 2010). International students also face additional challenges which may influence their academic and wellbeing outcomes, as they may experience greater barriers related to language proficiency, loneliness, and/or adjustment to the new academic context compared to domestic students (Alharbi & Smith, 2018).

Ethics

The study was approved by the ethics review board of the Faculty of Social and Behavioural Sciences at Utrecht University. Students were provided with information about the study and asked to indicate their consent prior to completing the survey by selecting the option “I consent.” The initial invitation to students was content-centered (Zhang et al., 2017); that is, it outlined potential intrinsic incentives to participate, such as contribution to knowledge on students’ experiences with examinations and wellbeing, with a short mention of compensation. Students were informed that their answers would remain confidential and that they were free to quit the survey at any time without consequences.

Analytic strategy

In order to explore to what extent academic and wellbeing outcomes differed between cohorts who completed an ICE versus a THE, a series of univariate and multivariate regressions for each outcome were conducted. First, the extent to which academic and wellbeing outcomes differed by cohort year was assessed. Second, additional multivariate regressions for knowledge retention including all other variables and demographics as controls were conducted. The data and statistical code for this study are available on the open data platform DANS (https://doi.org/10.17026/SS/ZB1GOU).

Results

Descriptive results

Tables 2 and 3 provide the means, standard deviations, and bivariate correlations for the pooled bachelor and master course samples, respectively. Overall, the samples are largely female (over 70%) and identified as Dutch. This is generally in line with the gender distribution within the social sciences in the Netherlands (i.e., ~ 70% female).Footnote 1 For both the bachelor and master course, the knowledge score was correlated with the student’s self-reported examination grade (rbachelor = 0.31, p < 0.01; rmaster = 0.33, p < 0.01). Low wellbeing was correlated with a lower overall grade, lower skill development, and a higher perceived workload.

Table 2 Means, standard deviations and correlations for the bachelor course
Table 3 Means, standard deviations and correlations for the master course

Differences between ICEs and THEs

As a first step, the mean values for each academic and wellbeing outcome for the cohorts that had ICEs compared to THEs are visualized. Figure 1 shows that, with the exception of knowledge scores, there are few differences between outcomes across examination form. However, this figure may not show differences across cohorts or courses. Therefore the mean values of each academic and wellbeing outcome across cohorts for each course were also plotted (see Figs. 2 and 3). The shaded areas in Figs. 2 and 3 indicate periods in which THEs were implemented. Figure 2 shows that, for the bachelor course, there appear to be no substantial changes in knowledge score, examination grade, or overall grade across ICE and THE cohorts. However, in the master course, one can observe a noticeable decline in knowledge scores across each respective cohort.

Fig. 1
figure 1

Mean values for each academic and student outcome by examination form for both master and bachelor courses

Fig. 2
figure 2

Mean academic outcomes for bachelor and master courses across cohorts. Shaded areas indicate cohorts when the assessment was in the form of a take home examination

Fig. 3
figure 3

Mean student outcomes for bachelor and master courses across cohorts. Shaded areas indicate cohorts when the assessment was in the form of a take home examination

Figure 3 again suggests that there were few changes across cohorts and courses in relation to skill development. Wellbeing outcomes reported in the master course also did not appear to change substantially over time. In the bachelor course, more variation in wellbeing outcomes is observed, as students reported relatively lower wellbeing and higher workload in the second cohort (2020–2021).

In the next step, it was estimated whether these differences were significant across cohorts for each course. The results for the bachelor course are presented in Table 4, and the results for the master course are presented in Table 5. Generally, the results for the bachelor course show no significant changes in most outcomes, with the exceptions of examination grade and low wellbeing. Students reported receiving higher examination grades (b = 0.40, SE = 0.16) in the last cohort (ICE 2022–2023) compared to the first cohort (ICE 2019–2020), and lower wellbeing (b = 0.51, SE = 0.17) in the second cohort (THE 2020–2021) compared to the first. There were no significant differences in knowledge scores, overall grades, skills, or workload across all cohorts.

Table 4 Ordinary least squares regression for academic and wellbeing outcomes on exam form cohort (bachelor course)
Table 5 Ordinary least squares regression for academic and wellbeing outcomes on exam form cohort (master course)

In Table 5, the results for the master course show that the students scored lower on the knowledge test in each subsequent cohort compared to the first. Recall that the master course implemented a THE starting in the second cohort. These differences are greatest when comparing the third (b =  − 0.97, SE = 0.43) and fourth cohort (b =  − 1.91, SE = 0.48) to the first cohort. Figure 4 illustrates these effects by plotting the marginal means on the knowledge retention test for each cohort. In addition, Table 5 shows that exam scores were also significantly lower (b =  − 0.65, SE = 0.25) in the fourth cohort (THE 2022–2023) compared to the first cohort. No other significant differences were found on other outcomes across cohorts.

Fig. 4
figure 4

Predicted mean knowledge score by cohort and examination form for the master course. ICE, in class examination; THE, take home examination

In order to assess the sensitivity of these results, additional analyses of knowledge scores for both courses were conducted, controlling for skills, examination grades, workload, wellbeing and demographic variables. The results remain largely the same (see Table 6). In both courses, a student’s performance on the examination was positively related to their performance on the knowledge retention test (bbachelor = 0.62, SE = 0.17, p < 0.001; bmaster = 0.48, SE = 0.18, p < 0.001).

Table 6 Ordinary least squares regression for knowledge retention score on exam form cohort and covariates

Discussion

The current study investigated the relative impact of ICEs and THEs on student academic and wellbeing outcomes across cohorts that span before and after the COVID-19 pandemic. Regarding academic performance, our results show that, in line with previous research (Doomernik et al., 2017), students who performed well on the examination also performed better on the knowledge retention test four to six months post course completion. Furthermore, our findings suggest that ICEs might have a slight advantage over THEs with respect to academic performance (Agarwal & Roediger, 2011; Moore & Jensen, 2007). In the master course, students in the THE cohorts scored significantly lower on their examination as well as the knowledge retention test compared to the ICE cohort. This result persisted in the cohorts not affected (as much) by the COVID-19 pandemic and restrictions on in-person education. In the bachelor course, while a similar trend is visible, no significant difference was found in either examination or knowledge retention scores. According to arguments favoring ICEs, it is possible that students made a calculated choice to invest their time in studying for an ICE, as opposed to the THE, where they could rely more on the materials (Moore & Jensen, 2007). While these findings may be attributable to a cohort effect, the overall results do show a clear negative trend in academic outcomes for the THE cohorts compared to the ICE cohorts.

The data used in this study spanned four cohorts affected in different degrees by the COVID-19 pandemic. The second bachelor cohort (2020–2021) who completed a THE reported significantly lower wellbeing and higher workload in comparison to the first cohort (pre-pandemic, 2019–2020). However, the following (third) cohort who also completed a THE did not exhibit significantly lower wellbeing than the first cohort. No cohort difference in wellbeing was detected for the master students. This finding can cautiously be interpreted in two ways. First, there is an abundance of research illustrating that the severe lockdown conditions had a negative impact on students’ wellbeing (Barbour & van Meggelen, 2023; Wang et al., 2020). Second, many programs started implementing THEs for the first time during the pandemic. It is therefore possible that the onset of the pandemic and THEs interacted to enhance the stressful conditions experienced by students at the time. While educators expected that access to the course material during the examination would lead to less testing anxiety (Tao & Li, 2012), in practice students reported mixed feelings about THEs (Er et al., 2023; Slack & Priestley, 2023). The challenges of remote learning, coping with pandemic pressures, and these uncertainties about THEs may have combined to exacerbate stress and anxiety within the learning environment. It is noteworthy that the third bachelor cohort, which took a THE under partial COVID-19 restrictions (2021–2022), did not report significantly lower wellbeing compared to the first cohort. It is possible that students had adjusted to the pandemic restrictions and/or gained familiarity with THEs, making the study and examination conditions less intimidating.

Although no large differences in wellbeing across cohorts were found, the findings do suggest that students who reported lower wellbeing suffer academically: they report a lower course grade, higher workload perception, and lower skill development (Moreira de Sousa et al., 2018; Thomas et al., 2017). While a causal relationship cannot be established, it is clear that academic performance and wellbeing are highly linked and should not be addressed in isolation. The link between examination form and wellbeing is however, less clear.

Finally, in contrast to the literature, no association between examination form and perceived workload was found (Slack & Priestley, 2023). This may be explained by the fact that the course load, goals, materials, and learning activities remained constant across cohorts. Further, while some scholars found a link between examination form and skill development, it was not possible to establish such a relationship in this study (Johanns et al., 2017). Again, both courses included other forms of assessment that remained constant across cohorts, such as group research assignments and/or group presentations. As such, changing a single element (examination form) may not have been enough to influence broader skill development.

The current study has several clear strengths; notably, this study includes measures of a variety of academic and wellbeing outcomes across four bachelor and master cohorts who completed both ICEs and THEs. However, a few limitations should be addressed. First, the response rates in some cohorts and courses were relatively low, and as a result, the sample sizes are modest across cohorts. Second, and relatedly, it is possible that the sample may suffer from selection bias. Students who had very low wellbeing or poor academic performance may have opted out of participating in the study. Indeed, those who completed the survey on average reported generally higher examination grades compared to the average examination grade for the course. This means that the sample may not capture the range of academic and wellbeing outcomes within courses, and the results may be skewed toward students who have higher examination grades and/or wellbeing. Recent studies have highlighted the importance examining the relationship between assessment and wellbeing, especially for students with mental health problems (Lister et al., 2023). Future research could consider delving deeper into a variety of wellbeing outcomes and measure them both pre- and post-course completion in order to get a more complete view on the impact of assessment on student wellbeing. Finally, the knowledge retention quiz was administered using a multiple choice format and so did not allow us to test higher levels of Bloom’s taxonomy. While the quiz questions required analytical skills, they were “retrieval” questions which may not capture the full scope of what the student academically gained and retained from the course. Future studies could consider a more open-ended approach to the time-lagged knowledge retention measure, such as an essay examination or an oral exam, in combination with a time-lagged test of the skill development of students.

Overall, no clear-cut advantages were found to either examination form with regards to student academic performance or wellbeing. Long-term retention seems to be primarily associated with examination success, echoing the importance of student preparation for an examination regardless of the examination form.

There are two possible implications for education based on these findings. First, it seems that assessment type may not play as strong of a role as expected in academic outcomes, which suggests that there are other personal and contextual factors that influence learning and knowledge retention. The selection of assessment type should ideally reflect the course learning goals, online or offline learning activities, and the broader curriculum in which they are embedded.

Second, given that the link between assessment and wellbeing may be bidirectional, it would be wise to consider two pathways of action. On the one hand, changing to a more open form of assessment (such as a THE) is not necessarily perceived as easier or less stressful by students. Providing students with a lot more guidance so they can gain familiarity with this examination form may reduce any negative wellbeing outcomes. On the other hand, an institution or program level approach to tackling student wellbeing may also prove fruitful for academic performance.