1 Introduction

Disruption from the recent COVID-19 pandemic, whose lockdown measures affected 220 million students worldwide (UNESCO 2021, Marinoni et al. 2020), rapidly transformed on-campus teaching into “emergency online education,” thereby creating challenges for students and lecturers alike (Marinoni et al. 2020). In particular, the temporal and spatial distances inherent in asynchronous lectures made it difficult to maintain student learning engagement across the entire semester (Roblyer et al. 2007), leading universities to promote course digitalization, and teaching and learning methods that go beyond the lecture format (Bhagat and Kim 2020). This necessity resulted in a growing demand from educational institutions for different tools and digital devices to facilitate online teaching (UNESCO 2021) of students forced to learn off campus.

One means of addressing this need is to support lectures with short message service (SMS) texts, an inexpensive and easily implemented technology that can assist both content learning and student motivation. Given students’ generally constant access to mobile devices (Ziden et al. 2017), text messaging can convey course-related information like relevant reading material while encouraging consistent review of lecture subject matter (Breunig et al. 2021; Ziden et al. 2017). In doing so, it promotes time- and location-independent engagement with learning content (Gasaymeh and Aldalalah 2013). In higher education especially, such messages can even influence behavior and serve as a motivator by potentially nudging students in their educational decisions (Escueta et al. 2020). For example, timely notifications can increase student attention to specific coursework (Castleman and Meyer 2020) among students struggling to allot sufficient class preparation time to numerous different courses (Breunig et al. 2021).

In recent years, a handful of researchers have evaluated the impact of text messages on student academic performance using randomized controlled trials (RCT) that employ either motivational, course-related, or performance-oriented intervention content. The findings, however, are mixed: Whereas a few experiments indicate SMS-induced performance gains (e.g., Castleman and Meyer 2020) stemming from either a motivational effect (Gasaymeh and Aldalalah 2013) or improved study behavior (O'Connell and Lang 2018), several large-scale studies find no evidence for any effect on academic attainment (e.g., Breunig et al. 2021; Dobronyi et al. 2019; Gill et al. 2016). The researchers variously explain these null effects by low-touch and infrequent interactions (Breunig et al. 2021; Oreopoulos et al. 2020) or the ease of ignoring messages (Lavecchia et al. 2016). Whatever the attribution, this lack of consensus on text messaging’s actual impact underscores the importance of further research (Oreopoulos et al. 2020), a need as yet met by only a handful of experiments. Our work is thus a useful expansion of this limited research on the impact of course-related SMS messages.

To test SMS message effectiveness for improving student academic performance, we assess whether and to what extent a simple SMS nudge influences final grades and whether content plays a role in this dynamic. We do so via an RCT in which 346 master’s course participants receive regular study reminders and information on lecture content in messages formulated solely to regulate study behavior without changing the students’ state of information or incentives. Rather, this nudge is designed to influence student learning engagement so it translates into improved performance, a treatment effect we measure by final examination and online test scores. We also explore the underlying mechanisms and potential transmission channels for any observed increase in performance by administering an endline survey of such subjective measures as time management and motivation.

Not only does this study provide additional evidence on the impact and decisiveness of course-related SMS messages in effectively nudging higher education students toward improved performance, it is one of only three we know of that use RCTs to assess these nudges’ effects on class outcomes in a university setting. The first, a large-scale evaluative experiment at a Canadian university (Oreopoulos and Petronijevic 2018), uses SMS communications (either information about campus offers or messages of encouragement) to compare the influence of in-person coaching with personalized mobile coaching. Whereas in-person coaching appears to positively impact GPA and grades, SMS coaching seems less promising. The authors therefore conclude (as do Dobronyi et al. 2019) that motivational SMS messages are not an effective tool to nudge students toward better academic achievement. These findings are confirmed in a study closely resembling our own (Breunig et al. 2021), which reports that simple weekly reminders on recommended readings for upcoming lectures do not enhance learning engagement of undergraduate students. Our study contrasts with this earlier work, however, in two important ways: First, our focus is on postgraduate students, a demographic that could arguably be more responsive to text messages than undergraduates; and second, our intervention took place during the COVID-19 pandemic, when lack of face-to-face student-faculty interaction raised the importance of the messaging medium itself. At the same time, it also differs methodologically from a handful of other pandemic era studies using RCTs to gauge the efficacy of various interventions (e.g., Hardt et al. 2020; Carlana and La Ferrara 2021), none of which employed a simple, cost-effective SMS nudge within a postgraduate setting.

2 Experimental design

Experimental participants. The participants for this experimental study were students in an English-language graduate course at the University of Hohenheim, Stuttgart (Germany), which covers the basic theory and application of various statistical methods for multivariate data analysis (MVDA) (see Tables 1 and 2 for a topical overview and program-dependent course characteristics, respectively). Although this course is open to students from different programs and faculties, most are pursuing a Master of Science in either Management or Education for Business and Economics. Participant recruitment occurred during the first week of the 2020/2021 winter semester, when lecturers announced the option to receive course information and useful reminders via SMS. Volunteers registered online by providing consent and a mobile phone number, as well as optionally filling out a short survey on basic demographic and academic characteristics (see consent form in B1). We then used a reproducible computerized procedure to randomize them into a treatment and nonplacebo control group.

Table 1 MVDA topics
Table 2 Interprogram differences in the final examination

Experimental setting. As at many other institutions, the COVID pandemic forced the University of Hohenheim to switch to a fully online teaching modality for the entire winter semester 2020/2021. Under this modality, the course comprised 15 asynchronous lectures grouped into seven work packages, which culminated in a live digital wrap-up session in which students could ask the lecturer questions and receive clarifications. In addition to posting videos of the recorded lectures and related materials on a weekly basis, the university’s ILIAS e-learning platform hosted an online forum for lecturer-student communication, as well as online tests for each different work package. These tests consisted of 12–20 multiple choice questions that resembled those on the final examination, for which it also provided simulations.

Although both the online tests and the final examination required 50% correct answers for a pass, the written examination determined the bulk of the final course grade, with successful completion of the online tests offering an extra 10% points (meaning a possible 110% maximum points, but only 100% of the points needed for the highest grade). This chance for extra credit was intended to encourage students to remain up-to-date throughout the semester while also compensating for the potentially negative effects of pandemic teaching modalities or the missed opportunity for additional encouragement in the control group.

We graph the dynamics between the 2019 winter semester MVDA course and the SMS intervention in Fig. 1. On the right-hand side, we outline the course’s primary teaching modalities which, together with course design, supplementary materials, and lecturer experience and enthusiasm, inherently influence student motivation and time management. Importantly, all the above-mentioned factors ultimately affect the level of knowledge acquired during the course and thus academic performance. Because a large share of performance variation, however, is also attributable to individual demographics, registration for the experiment included voluntary collection of relevant student characteristics to check for pretreatment balance. In addition, as the flowchart illustrates, we administered an endline survey of student satisfaction with the course modalities to assess the post-treatment balance.

Fig. 1
figure 1

SMS Intervention Flowchart

Experimental procedure. Based on the documented efficiency of SMS nudge interventions in other contexts (Goh et al. 2012), we designed our SMS messages to boost student motivation and course involvement through direct engagement via an informal medium (the smartphone) outside of scheduled online lectures and forums. We thus coordinated the message content and topics with weekly course lectures, practical sessions, and online tests.Footnote 1 To ensure experimental fairness, these nudges provided no additional information (i.e., the SMS contained only the same information as on the lecture slides) and their content was uploaded weekly onto the ILIAS e-learning platform, thereby treating the control group with online exposure only. Because all students, regardless of treatment assignment, had to log into the e-learning platform frequently to (re)watch the weekly posted lectures, slides, online tests, and live wrap-up sessions, the control group had easy access to the SMS content.

Although this content could admittedly have been shared across various student social media groups, its disclosure was necessary to reduce any discomfort among control group members (i.e., reverse engagement bias) and to allow disentanglement of any motivational treatment effect from improved academic outcomes due simply to greater information availability (c.f. Heckman et al. 2000). We thereby not only isolated the SMS messages’ motivational effect but mitigated any potential reverse engagement bias among the control group observations. As an additional guard against biases, we also kept the SMS content hidden from the lecturers who designed the final examination and graded the online tests.

In evaluating the research outcomes, we apply the minimum detectable effect (MDE) size reported in our original study protocol submitted to the Ethics Committee of the University of Hohenheim in May 2020, derived from an a priori power analysis using the standard deviation for a sample of 270 MVDA examination grades from the previous winter semester (17.3 points). In this latter, by setting a power (1 − β) of 0.80 and a statistical significance threshold (α) of 0.05, we obtained an MDE of 4 points (or 0.23 SD units).

To minimize the treatment effect standard error, we randomized the days and times that these participants received single text messages based on conventional study hours (Monday to Saturday, 8AM-7PM) (Ziden et al. 2017). We had previously determined the optimal treatment frequency by means of a pilot study implemented in the summer semester 2020,Footnote 2 which identified three texts per week (i.e., 34 SMS messages over the semester) as appropriate for increasing engagement while avoiding information fatigue. Likewise, to increase message effectiveness, we paid close attention to content framing and wording (Ajzenman and López Bóo 2019), avoiding repetitive sentences, using sufficient linguistic variety, and selecting terms that communicate a sense of urgency or reward (e.g., “last chance” or “successfully”), especially in deadline reminders.

Treatment evaluation. We evaluate the intervention treatment effect based on points earned on the final examination (with and without inclusion of the 10% extra credit for online tests). Because the first examination session occurred immediately after the lecture phase ends, whereas the second took place 3 months later at the start of the next semester, the resulting dataset allows comparative analyses between the short- and longer-term effects (see Figure B2 and Table C1 for the project timeline and SMS content, respectively).

Fig. 2
figure 2

Flow diagram of sampling process

3 Data and identification strategy

3.1 Data

Electronic survey data. Beginning in the second week of the winter semester 2020/2021, (November 2020), we collected baseline and follow-up information via ILIAS online survey from the 393Footnote 3 students enrolled in the MVDA graduate class, 346Footnote 4 of whom signed up for the SMS message experiment and provided such demographic data as gender, age, and program of study. Randomization of this sample (see Fig. 2) yielded treatment and control groups of 176 and 170 participants, respectively.

In the last weeks of semester before the final examination, we encouraged students to complete a second short online survey on their perceptions of lecture modalities; more specifically, on their experience with the course environment and their own study behavior. Those who did numbered 152 observations (75 and 77 from treatment and control groups, respectively), a significantly smaller sample than the original. Nonetheless, given the strict anonymity of the official student evaluations, the second survey served as a post-treatment balance test. It also collected information on outcomes for which we expected no relation with the motivational SMS nudges, including dummy and Likert scale measures of satisfaction with asynchronous lectures, experience of technical issues, and willingness for teaching to be online in a post-pandemic scenario. Rather than any direct questions about SMS reception, the survey included an open-ended item inviting general comments on the lecture modalities. The only respondent to suggest improved SMS delivery was a control group member who had received no messages.

Administrative data. As previously outlined, we proxy student performance by final examination and online test grades obtained from the official university examination registry. Because students from different masters’ courses must fulfill slightly different requirements to pass the final, we rescaled our two main outcome variables (final examination and online test points) on a 0–100 range for comparability. Additionally, we created two longitudinal datasets on the grades of single examination and online test questions, respectively. Depending on the level of difficulty, the questions were worth between 1 and 4 points, however, for comparability we rescaled them to a semi-continuous measure (from 0 to 1). However, in all regression analyses on examination and online test points, we present treatment effects in SD units. Lastly, we mark each question as either treated or not, based on whether the associated SMS message contains useful information for answering it correctly.

Descriptive statistics. For the final sample of 275 participants whose results are analytically relevant, the average age is 24, approximately two thirds are female, and most (83%) are first semester enrollees in the master’s in management program (see Table 3, Panel A). As column 5 shows, the baseline characteristics of the treatment and control groups are balanced (at α = 0.05) both individually and jointly (F test p value = 0.465). Of all those enrolled in the course, 43% took part in the endline survey (Panel B). Finally, in these longitudinal data, the average (rescaled) score per question in the examination and online tests was 0.79 and 0.67 points, respectively, with 16% and 14% of the questions classified as treated (Panels C and D).

Table 3 Descriptive statistics: full sample

3.2 Identification strategy

In the non-placebo randomized control trial used to estimate the motivational SMS message impact on student academic outcomes, although the randomly selected treatment group cannot contain never-takers,Footnote 5 the treatment coefficient must be interpreted as intention to treat (ITT) effects. In fact, some students might have decided to non-randomly opt out of treatment by simply ignoring the SMS nudges. Likewise, the overall sample cannot contain always-takers because no student in the control group ever receives a message. In fact, because take-up of interventions like ours cannot be forced on the students, an ITT effect is the effect of interest. Nonetheless, as long as both groups’ semester-long access to SMS content on the ILIAS platform ensured their equal satisfaction with course modalities (i.e., the assumption of no reverse engagement bias holds), the treatment impacts are likely to isolate a purely motivational effect, as subsequently tested using post-treatment measures of course satisfaction.

For the remainder of this discussion, we formalize the different estimation strategies based on the type of data analyzed (i.e., whether cross-sectional or longitudinal). We begin with Eq. 1, which formalizes the OLS estimation strategy for the set of regressions that use cross-sectional data to measure the motivational message effect on the final examination including (10% extra) online test points. Here, \({Y}_{i}\) denotes either the examination points, \({T}_{i}^{group}\) is a dichotomous indicator taking value one if an observation belongs to the treatment group, and zero otherwise, \({X}_{i}\) is a vector of demographic and academic control variables at the individual level, and \({\varepsilon }_{i}\) is an idiosyncratic error term.

When the dependent variable is the probability of passing the final examination (with or without 10% extra points), we employ the same specification as in Eq. 1 but estimate it using logistic regression. To maximize statistical power, we again pool the question-level data, which considerably increases the sample size and enables robustness testing of our estimates using individual-level cross-sectional data. Statistical inference for estimates from all models account for heteroskedasticity by correcting the standard errors with a sandwich estimator or correcting for individual-level clustering when using pooled cross-sectional data.

$${Y}_{i}={\alpha }_{0}+{\alpha }_{1}{T}_{i}^{\mathrm{group}}+{\sum }_{j=1}^{8}{\sum }_{i=1}^{N}{\alpha }_{j}{X}_{i}+{\varepsilon }_{i}$$
(1)

When we employ longitudinal data at question level, we estimate a two-way fixed effect OLS regression in which the treatment assignment is at the question level. Because our experiment is not designed as a cluster RCT, controlling for question fixed effects is essential to avoid any bias in the treatment estimates related to question characteristics. In fact, as earlier noted, we intentionally keep instructors blind to our question design, which itself maximizes treatment effectiveness by targeting relevant subject topics. This estimation strategy, formally outlined in Eq. 2, corresponds to a two-group generalized difference-in-differences (DID) approach in which the coefficient of \({T}_{i,q}^{\mathrm{question}}\) can be interpreted as an ITT. The standard errors’ estimated accounts for heteroskedasticity by clustering the standard errors at the individual level.

$${Y}_{i,q}={\alpha }_{0}+{\alpha }_{1}{T}_{i,q}^{\mathrm{question}}+{\varphi }_{i}+{\zeta }_{q}+{\varepsilon }_{i,q}$$
(2)

The strategies outlined above also enable robustness testing of our RCT design as follows: First, if the treatment is properly randomized, the inclusion of control variables should not significantly change the point estimates, a constancy we can test by running bivariate regressions on Eqs. 1 and 2. Second, according to the a priori statistical power analysis, our cross-sectional data can correctly detect a statistically significant effect at the 5% significance level 80% of the time when the MDE is greater than 0.23 SD units (4 points). Conversely, the longitudinal samples of examination and online test questions themselves have an even greater ability to correctly detect a true statistically significant ITT, with a sample size of 11,362 examination questions-students yielding an MDE of 0.018 points (0.045 SD units; 1 − β = 0.80; α = 0.05; SD = 0.40). However, it should be noted that the latter MDE calculation was performed ex-post because we did not have access to prior information on these longitudinal samples.

3.3 Potential bias

Although the treatment and control groups are statistically balanced on demographic and academic characteristics (see Table 3), which implies that the conditional independence assumption holds, biased treatment effect estimates could result from several mechanisms, whose threats to causality, limitations, and solutions we discuss below:

Selective attrition. Because the German higher education system allows students to sit the final examinationFootnote 6 either a few days after the last lecture or 2–3 months later, students might self-select into a specific session. In the case of MVDA, most students choose the first sitting available, with those selecting the later session showing lower average academic performance, even after we exclude students who previously failed. Obviously, if such self-selection is based on background characteristics and the pretreatment balance does not hold, then the first session sample estimates cannot be considered unbiased. However, according to the descriptive statistics and difference in means for the first session subsample (Table C2) and the estimated treatment effect on the probability of sitting in the first session examination (Table C3), such is not the case. It should also be noted that 60 of the students enrolled in the experiment did not sit the examination in either session, but these observations are both balanced in their pretreatment characteristics and equally distributed between treatment and control groups.

Reverse engagement bias. If exclusion from receiving the SMS nudges has any effect on the control group’s academic performance (reverse engagement bias), these observations would not represent a valid counterfactual (because possibly negatively affected by the intervention). In fact, were this the case, the estimated ITT would not resemble the effect of the preprotocol plan. However, determining the bias direction for this scenario is far from simple given that exclusion could have either decreased their motivation (overstated ITT) or pushed them to study harder to compensate for the lack of treatment (understated ITT). Our experimental design thus tries to eliminate both possibilities by providing the SMS content to the whole sample via the e-learning platform. The appropriateness of this strategy is supported by the fact that in the (anonymous) lecturer evaluations, only one control group student reported not receiving any SMS messages.

To rule out the possibility of reverse engagement bias, we conduct a series of analyses that alternatively regress three different post-treatment measures of course satisfaction and a dummy for endline survey participation on the treatment group dummy variable and the full set of control variables. As Table 4 shows, the treatment variable coefficient never differs statistically from zero, confirming the improbability of any reverse engagement bias. Moreover, a supplemental test on whether the control observations’ awareness of their status makes them work harder to compensate the lack of treatment exposure—offers no evidence of compensation for not receiving the SMS directly. For this test, because the online forum and live wrap-ups provided the sole opportunity to ask lecturers for clarification, we assess the treatment effect on the probability of posting in the forum at least once (see Table C4).

Table 4 Effect of SMS on the course satisfaction measures: OLS, full sample

4 Results and discussion

Our econometric approach combines OLS and logit assessments of the treatment effect on academic outcomes and a series of generalized difference-in-differences (DID) models that measure the efficacy of matching SMS message texts to course content:

OLS regressions. To identify the causal effects of motivational SMS messages on the number of final examination points earned, we employ a series of OLS regressions that exploit the random experimental variation in the treatment variable. We report the outcomes first for the first session sample only (February 2021) and then for a pooled sample of the first and second sessions (July 2021) with retakers excluded (see Tables 5 and 6, respectively). The first column of each table lists the OLS estimates for a basic model with no control variables, after which we stepwise include demographic (column 2), academic (column 3), and examination-related (column 4) control variables.

Table 5 Effect of SMS on examination points: OLS, first session sample
Table 6 Effect of SMS on examination points: OLS, full sample

Because the inclusion of additional controls is likely to affect the ITT effect size by a significant amount when the treatment (cor)relates with any observables and/or with its linked unobservables (i.e., in the presence of severe bias), this progressive inclusion tests the stability of our ITT estimates. At the same time, given the ability demonstrated in our a priori power analyses to correctly detect a statistically significant effect 80% of the time only when the true ITT is larger than 0.23 SD units (4 points), the inclusion of a set of meaningful control variables improves the precision of the ITT point estimates, allowing us to draw more statistically valid conclusions.

According to the first session estimates (Table 5), receiving the motivational SMS message improves academic performance by between 0.20 and 0.25 SD units (3.38 and 4.34 points), depending on specification. Although all four ITT coefficients are statistically significant at the 0.05 level, we can confidently state that only when the ITT estimate exceeds the MDE of 0.23 SD units (4 points; columns 3 & 4) are the treatment effects positive and statistically different from zero. The effects for the full sample of experimental participants who sat the final examination, however, are significantly smaller (approximately half the MDE) and statistically nonsignificant (see Table 6), which we interpret to mean a nontrivial amount of treatment heterogeneity between the first and second session examinees. In fact, even if we use a dummy indicator to limit the potential bias from self-selection into the second session, the treatment effect is likely to vanish in the longer term (about 3 months).

Given the treatment aim of boosting student motivation to keep up with lectures across the semester, this diminishment would assumedly apply most to students already planning to focus first on other subjects and sit the MVDA examination in the second session. Even if the much smaller second session sample of 45 observations precludes direct statistical testing of this hypothesis, the treatment appears to have no statistically significant effect on the probability of sitting the first versus the second examination session (see Table C3).

Logit regressions. We next ensure the quality of the above ITT estimatesFootnote 7 by using a logit maximum likelihood estimator to regress a dichotomous variable for passing the final examination on the treatment indicator. We again estimate this model for both the first session (Table 7) and the pooled cross-sectional (Table 8) samples while accounting or ignoring (top vs. bottom panels) the extra points from online tests (0–10). As no analysis using graded online tests and related extra points as outcome variables was included in our initial experimental protocol, we view the associated results as exploratory.

Table 7 Effect of SMS on passing the examination: logit, first session sample
Table 8 Effect of SMS on passing the examination: logit, full sample

According to these new estimations, the treatment has a statistically significant positive effect only on the first session sample. More specifically, the marginal effects at mean values reported in each panel suggest that being treated increases the probability of passing the examination by about 6.5 percentage points (pp) (Table 7, column 4) when extra points are included and 7.6 pp (Table 7, column 8) when not.

Not only do these results demonstrate that the positive ITT effect is not solely dependent on the extra tests points, but the slightly larger treatment effect when they are omitted conforms to our expectation that the treatment, by stimulating more regular study, tends to benefit formal examinations more than open-book online tests. Conversely, we do not detect any positive treatment effect when using the pooled cross-sectional sample (see Table 8). We conclude that the SMS treatment has had an economically meaningful effect on the probability of passing the examination for the first examination session students, but due to limited statistical power we are unable to distinguish if the positive ITT estimate for the full sample is statistically relevant.

Pooled OLS. To address any concerns regarding statistical power, especially when using the two-session sample, we replicate the analysis reported in Tables 5 and 6 employing data at the student-question level. In addition to significantly increasing the sample size (11,362 and 8,902 observations), this replication exploits the random variation in the treatment variable. In all models, the dependent variable takes a continuous value between zero and one, and all specifications control not only for demographic, academic, and examination characteristics, but also for question fixed effects.

The results of these pooled OLS regressions reveal that, on average, treatment with the motivational messages results in a 0.05 (full sample) or 0.07 (first session sample) SD units increase in points earned on each question (Table 9, columns 1 & 3), with both estimates highly statistically significant once corrected for heteroskedasticity. Although these results confirm some degree of effect heterogeneity between students in both sessions, they also underscore the likelihood that the statistical nonsignificance of the full sample cross-sectional results is due to insufficient statistical power to detect an effect smaller than the MDE.

Table 9 Effect of SMS on examination questions: OLS

Exploiting this same empirical approach, we also conduct an exploratory analysis into the ITT effect of SMS message reception on the online test points per question (in SD units) for course content that is more technical. That is, rather than differentiating between examination sessions, we estimate the ITT for the full set of online tests (Table 9, column 1) versus the subset of tests related to OLS, logit, ANOVA, and factor analysis (column 3). As in the previous results for examination performance, the treatment increases the points earned by 0.03 and 0.02 SD units, but with only the first estimate being statistically significant. Compared to the effects on examination question points, these smaller ITTs are in line with our previously stated assumption that the treatment is only marginally effective in the context of an open-book test.

Difference in differences. Our final analysis exploits the longitudinal nature of the examination and online test question dataset by estimating a series of difference in differences (DID) that test the point gain when any of the SMS messages is closely related to a specific question. This treatment dummy variable varies at the question level for those in the treatment group but remains always equal to zero for those in the control group. In addition to the full set of controls, these models also include two sets of fixed effects that correct for individual- and question-level heterogeneity. Interestingly, when the message and examination question are on the same topic, we observe a null effect (Table 9, columns 3 & 4), but when considering online test question and SMS messages on the same topic, the students gain an average of 0.045 or 0.065 SD units (Table 10, columns 2 & 4, respectively). These results, together with those when the treatment is defined by group (e.g., Tables 5 and 6), confirm that the SMS messages have a beneficial effect on longer-term academic performance, but through motivation rather than the provision of additional information. We can draw this conclusion not only because the SMS message content matched that presented in lectures (and reported in the course slides), but our experimental design provided the same information to every student via the e-learning platform.

Table 10 Effect of SMS on online test questions: OLS

Because the 4-point treatment effect of the motivational nudge corresponds to only a small behavioral change but at a marginal cost (Smith et al. 2018), we lastly examine the results from the perspective of cost-effectiveness. Given a total expenditure for the entire SMS project of approximately €328—which includes initial costs for the mobile phone, SMS dispatch, and personnel time designing, organizing, and implementing the intervention—the amount per student in a treatment group of 176 is €1.86, relatively inexpensive compared to other behavioral interventions in higher education. Statistically, based on a treatment effect of 0.25 SD, this sum is equal to about €7.44 per one SD increase in performance. Clearly, programs such as tutoring (Carlana and La Ferrara 2021), mentoring (Hardt, Nagler, and Rincke 2020), or SMS coaching (Oreopoulos et al. 2020) entail higher expenses due to staff training costs. Hence, given the comparable or somewhat more pronounced effect on academic performance compared to the effort in other studies, our intervention seemingly exhibits high cost-effectiveness, an assumption supported by Kraft's (2020) cost-effectiveness analysis in the context of educational interventions.

5 Conclusions

By using an RCT experimental framework to investigate the causal effect of motivational SMS nudges on student academic performance, we confirm that one-way communication between instructors and students via SMS outside scheduled lecture hours is an effective tool for improving the performance on both online tests and the final examination. This positive treatment effect is in line with that documented by earlier studies in different contexts (e.g., Angrist et al. 2022; So 2016). The experiment’s nonplacebo design also provides evidence for the mechanism underlying the treatment effect; namely, greater student engagement and motivation, as opposed to the availability of additional information (as in Paloyo et al. 2016). We also admit, however, that based on the 30% and 40% reduction in estimated ITT among students that sat the examination around 3 months after the last course lecture, the treatment effect is likely to fade away with time.

We acknowledge several limitations in interpreting the study results that could potentially affect its broader applicability (external validity). Most notably, at least three different explanations are possible for our intervention yielding positive treatment outcomes not observable in other studies, one of them general; the others more contextually specific. First, and most obvious, the postgraduate participants in our study may have responded differently to the text messaging than their undergraduate counterparts in other studies. Second, the intervention occurred during the severest COVID-19 lockdowns when the mandate against face-to-face contact likely raised the importance that students attached to direct SMS communication from the course coordinator. Third, because student-professor interactions in German universities are more formal than those in many other countries (e.g., the US), the cultural context could have significantly affected the impact of messaging directly from a faculty member. Finally, it should also be noted that our study’s external validity is negatively affected by the fact that about 12% of the students enrolled in our course did not agree to participate in the experiment. Nevertheless, our research findings remain valuable to higher education lecturers and policy makers because of text messaging’s potential as an extremely low-cost supplementary learning tool that is easily implemented and reaches all class members regardless of personal characteristics. Of particular note, our isolation of the texts’ motivational effect by exclusion of any information not covered in the lecture implies the possible enhancement of message effectiveness by inclusion of extra information (e.g., background on the topic). Similarly, the relatively primitive nature of the simple SMS text may imply that substituting SMS with an over-the-top Internet service able to deliver audio, video, and other media images might significantly improve the messaging experience and thereby its effectiveness (So 2016). Even personalization of the simple text message might improve motivational effectiveness, although doing so would also be more costly.

Whereas all such conjectures offer valuable opportunities for future research, our own findings confirm that supplementing traditional lectures with even a seemingly primitive communicational medium built on simple, motivational, course-focused messages can efficiently and cost effectively improve student motivation and engagement. Not only is this potential especially important when online teaching is unavoidable, but it holds great pedagogical promise for on-campus teaching.