Introduction

Theoretical background

mHealth intervention for work-related stress

Work-related stress is a well-known risk factor for mental ill-health, including burnout syndrome and depression [1, 2]. These health issues are detrimental both to individuals and organizations as they are linked with increased sick leave, turnover rates, and productivity loss [3]. Interventions to prevent this problem are urgently needed to support the health and wellbeing of employees as well as decrease the social costs associated with stress-related health issues.

mHealth solutions are among the most promising options for providing effective, accessible, and scalable interventions in an organizational context [4, 5]. These kinds of interventions have been found to be effective for stress management and related health outcomes [6]. In addition, mHealth interventions are significantly easier to scale and standardize compared with conventional on-site workplace interventions [7, 8]. Given the large number of mobile phone users in today’s world, mHealth solutions provide unique opportunities for making interventions widely available.

One way to mitigate the negative consequences of work-related stress is through interventions promoting recovery strategies—behavioral and psychological strategies that alleviate short-term stress reactions [9]. Different forms of recovery behaviors (e.g., detaching from work and mindfulness) are well-known to have positive effects on stress-related health problems [10]. A behavior change intervention that successfully increases the quantity and quality of recovery strategies may thus be effective in combating the long-term negative effects of stress.

To effectively increase the uptake of recovery strategies the intervention needs to support behavior change [11]. Behavior change techniques (BCT)—replicable components of an intervention designed to enable behavior change—provide a systematic method for designing behavior change interventions. The BCT framework presents a taxonomy of 93 behavior change techniques based on consensus from an expert panel, providing many benefits for the development and evaluation of an intervention [12].

Including BCTs in an intervention – for example self-monitoring, feedback, goal-setting, knowledge-shaping, and rewards—increases the chances that the intervention will successfully induce behavioral change and have a positive effect on intended health outcomes [11]. Additionally, the structure, components, and content of the intervention should be rooted in theory-driven models of habit formation to further support long-term behavior change [13]. For instance, daily reminders which cue self-monitoring at a specific time support developing a habit of daily self-reflection.

Employing the BCT framework also has additional benefits, including improved replicability and faithful implementation [14]. By using replicable and well-defined intervention components it is significantly easier to accurately replicate the intervention in future studies. This aspect of replicability is also important for implementation, ensuring that real-world interventions implemented in workplaces are sufficiently similar to the intervention tested in a study.

The present study

This study presents a preventive, low-intensive mHealth intervention designed to mitigate the negative consequences of work-related stress. Low-intensive interventions and active monitoring are recommended as a first response for subthreshold diagnoses, which are prevalent in the overall population [15, 16]. This aligns with the “stepped care” approach, in which widely accessible and low-intensive initiatives are offered prior to more intensive and costly treatments [17].

The intervention is a mobile application used once daily over a period of four weeks. Each daily interaction takes a few minutes, prompting users to reflect on their current mood and also provides suggestions for a variety of recovery strategies. Thus, users get into a daily habit of self-monitoring their stress and energy levels, as well as receive knowledge about effective tools for managing stress. See the Methods section for more details regarding the intervention.

The motivation of the present study is to pilot test the intervention and a study protocol in preparation for a future randomized controlled trial. Pilot testing the intervention and study procedures at an early stage is critical to identify potential pitfalls that need to be addressed before conducting a full-scale trial [18, 19]. Through investigating the study and intervention in preparatory phases we can refine the study protocol and intervention design in order to maximize the chances of a successful RCT.

A novel aspect of the intervention that needs to be tested is the daily format in which users respond to questionnaires and engage with intervention content every day (see the Methods section for more details on the intervention design). This structure minimizes the effort associated with the intervention and supports a daily habit of employing recovery strategies. Furthermore, the intensive longitudinal design provides a fine-grained view of how each participant´s experience changes dynamically over time, providing unique insights into daily fluctuations and the impact of the intervention on an individual level [20].

With mHealth interventions it is also critical to pay attention to constructs such as engagement and acceptability. Engagement with mHealth interventions is notoriously problematic with many studies reporting low engagement rates, negatively impacting the potential effectiveness of the intervention [21, 22]. It is thus important to ensure an engaging and acceptable intervention design to optimize intervention effects.

Notably, engagement and acceptability are complex and overlapping constructs that are used in different ways depending on the scientific field and purpose of the study. For the purposes of this study, engagement is defined in terms of how often participants use the intervention and how engaging they find the mobile application. Acceptability refers to wider aspects such as how appropriate, relevant, and satisfactory the intervention and digital tool is overall [23].

A last motivation of the study involves assessing whether the methods used to measure study outcomes function as intended. In part, it is necessary to ascertain that participants complete questionnaires and that these measures comprehensively reflect the intended intervention effects. Regarding the daily repeated measures, it is important that these are sensitive enough to detect within-person changes to be able to maximize the information gained from the intensive longitudinal data [20].

Aims and objective

The objective of the study is to pilot test the study protocol and intervention design of a preventive, low-intensive mHealth intervention for work-related stress in preparation of conducting a full-scale randomized controlled trial. Primary research questions include:

  1. 1)

    Data collection procedure—What is the recruitment and retention rate of invited participants? Does the randomization algorithm function properly?

  2. 2)

    Engagement—How often do participants use the intervention? Do they find the application easy and engaging to use?

  3. 3)

    Acceptability—Do participants find the intervention overall satisfactory and perceive it as beneficial? Is the digital tool technically stable?

  4. 4)

    Measurement quality—What is the completion rate for questionnaire items? What is the within-person variability in the daily measures?

Methods

Study design

Three groups of participants received different versions of the intervention, each version containing a distinct set of recovery strategies (see Sect. “Intervention versions” for more details on the intervention versions). Participants were sequentially allocated to groups in a 1:1:1 ratio using blocked randomization with randomly selected block sizes (3, 6, 9). The first author of the paper generated the allocation sequence, enrolled participants, and assigned participants to their intervention group. The allocation sequence was generated using “sealed envelope”, an online software for creating blocked randomisation lists (Sealed Envelope Ltd. 2022). All study participants were blinded, not informed of the group they belong to. The study took place during the period May 2022 – December 2022. Ethical approval was granted by the Swedish Ethical Review Authority. All methods employed in the study were conducted in accordance with appropriate guidelines and regulations, including the CONSORT checklist [24].

The intervention and data collection were implemented through the mobile application m-Path, a software designed for real-time monitoring as well as creating and providing interventions [25]. The study used a PPF (pre-, post-, follow-up) structure in which outcomes were measured immediately before the intervention, immediately after the intervention, and one month after the end of the intervention as depicted in Fig. 1. Data were also collected daily as part of the intervention (see the following section). Enrollment for the study was continuous, such that invited participants could choose when to start the study and intervention.

Fig. 1
figure 1

Flowchart of study design. Note – Sixteen participants were assigned to the intervention groups though 15 completed the pre-test measure

The whole study protocol—including informed consent, data collection, and intervention material—was automated. An invitation email contained all necessary information for starting and completing the study. A link in the invitation email along with a specific code allowed participants to download the m-Path application and receive informed consent through the application. Upon consenting, the pre-intervention measure was available through the app and the intervention started a day after completing the pre-intervention measure (in case participants did not complete the pre-intervention measure within a week, the intervention would start at that point).

All content was entered into the application by the first author through the m-Path back-end system. m-Path is available both on Android and iOS and was distributed to participants through Google Play and App Store respectively.

Intervention details

DIARY

DIARY – Daily Intervention for Active Recovery—is a 28-day intervention during which participants are prompted once daily to engage with intervention content. Each daily intervention interaction includes a short questionnaire with questions regarding sleep quality, current mood (e.g., tense, relaxed), and energy levels. Participants were prompted to open the application through a notification at 18:00 each evening. In case they did not fill out the questionnaire, an additional reminder notification was sent out at 20:00. The questionnaire closed each night at 03:00 am, at which point it was no longer possible to access the questionnaire for that day. The questionnaire took at most 5 min to complete.

Upon completing the daily questionnaire participants received a prompt – a “bit-size” amount of information regarding stress and recovery as well as suggestions for a specific recovery strategy. A sample prompt informs about micro-breaks: “Another way to recover from work stress is to take breaks during the day – moments of relaxation when you completely let go of work demands. However, when there is lots going on and we feel stressed it can be difficult to find the time for longer breaks. Perhaps there are no clear opportunities for resting in between tasks. In these cases, it is especially important to do short interruptions – micro-breaks – to sit down, close your eyes, and breathe deeply for a minute.”

Three different sets of prompts were developed promoting different kinds of recovery strategies—(1) social support, (2) psychological strategies, and (3) physical activity – creating three versions of the intervention given to separate groups of participants. Dividing participants into three groups functioned as a way to test different versions of the intervention, as well as to investigate how to randomize participants to different experimental groups in a future randomized trial. All versions of the intervention were identically structured (apart from the recommended recovery strategies) to ensure that differences in outcomes are due to recovery strategies and not related to the intervention format, engagement levels, adherence, technology, or other confounding factors.

Figure 2 depicts what the intervention looks like as it is operationalized in the phone. Initially self-monitoring is completed through multiple-choice questions answered sequentially. Afterwards, a text describing a particular recovery strategy is presented. Another important feature is that users can track their monitored values in graphs.

Fig. 2
figure 2

Screenshots of the DIARY mobile application. Shows how participants respond to multiple-choice questions (left), receive information regarding recovery strategies (middle), and may track their monitored data (right)

Intervention versions

Social support

One version of the intervention prompted users to engage in social support which is thought to buffer against the negative effects of stress [26, 27]. This effect is present in occupational settings, with several studies indicating that social support plays an important role in preventing burnout among nurses [28, 29]. Furthermore, interventions targeting social support in the workplace suggest that these have positive effects on mental health [30, 31]. Sample strategies included asking for help from co-workers, listening with compassion, and sharing authentic emotions.

Physical activity

Another version of the interventions promoted an increase of physical activity in daily life. Physical activity is well-known to improve various health outcomes similar to our outcomes of interest, for instance reducing stress and burnout symptoms [32, 33]. Additionally, physical activity interventions in the workplace are widely used and have been found effective in many studies [34, 35]. Sample strategies included taking walks, going to the gym, and using the stairs instead of the elevator.

Psychological strategies

A final version of the intervention promoted a variety of psychological strategies for stress reduction. Sample strategies included sleep quality improvement tips, mindfulness, and work detachment – evidence-based strategies that have a positive effect on outcomes of interest [36, 37]. Workplace interventions targeting these kinds of strategies have been found to be effective [38, 39].

Participants

Radiology nursing students from Karolinska Institutet were invited to take part in the study. An e-mail from the research group was sent out to all students (N = 90) in three classes of the radiology nursing program. The e-mail contained an invitation to take part in the study, participant information, as well as instructions for how to download m-Path and join the study. Additionally, a member of the research group gave a short presentation about the study to all invited participants. Informed consent was obtained through the application prior to collecting any data. Table 1 shows demographic data from all participants.

Table 1 Descriptive data from all samples

Though the study population (students) is seemingly different from our target population (workers) there are several reasons for why this study population is relevant for this trial. Firstly, the education is a specialist nursing education with a strong focus on preparing students for their working lives, similar to a vocational program. Significant portions of the education, including during the time of data collection, involve on-site training in real-world working conditions. Additionally, the mean age (around 30 years) is of an adult working age rather than that typical of students. Importantly, this early-career period when new at work is a risk factor for stress-related health problems and thus a key target demographic for our intervention [40, 41].

Outcomes

Figure 3 presents all outcomes and their operationalizations.

Fig. 3
figure 3

Depicts primary outcome measures and their operationalization

Data collection procedure is measured through recruitment rate, retention rate, and evaluating the randomization algorithm. Recruitment rate was calculated as the percentage of participants who joined in the study (filled in informed consent) relative to all who were invited to the study. Retention rate was calculated as the percentage of participants who completed each measure (pre-, post-, follow-up) relative to all participants. Randomization algorithm was evaluated by calculating the distribution of participants in the three intervention groups.

Engagement was measured using adherence and the App Engagement Scale. Adherence was operationalized as a count variable coded 0–28 representing the number of days that a given participant used the intervention. The App Engagement Scale is a 7-item questionnaire designed to measure engagement with mobile applications [42], translated into Swedish by the research team. This translation has been used previously by the research team and has preliminary evidence of good reliability [43]. Items (e.g. “I enjoyed using the app”) are scored on a 1–5 ordered categories scale (1 = Not at all, 5 = Fully agree). This measure was only included in the post-intervention measure.

Acceptability was measured using a set of single-item measures evaluating whether the intervention was relevant to the user, if they would like to use it again, quality of the prompts, and technical stability. These items were only included in the post-intervention measure. All questions and response options regarding acceptability metrics are available in Appendix A.

Measurement quality was evaluated based on completion rate and within-person variance of daily measures. Within-person variance was calculated as the intraclass correlation coefficient (ICC) from an intercept-only multilevel model with daily stress as outcome. Completion rate was calculated as the percentage of questionnaire items that were answered by participants. The following are the outcomes and corresponding questionnaires planned for the randomized controlled trial:

  • Exhaustion and disengagement from work was measured using the Oldenburg Burnout Inventory, an instrument designed to measure burnout in an occupational context including the dimensions exhaustion and disengagement [44]. This study used a Swedish translation with a subset of 7 items [45]. Items (e.g. “after work I often feel tired and exhausted”) are scored on a 4-point ordered categories scale (1 = Not at all, 4 = Exactly).

  • Emotional exhaustion was measured using the Shirom-Melamed Burnout Questionnaire (SMBQ), an instrument designed to assess burnout symptoms [46]. The study used a Swedish translation and subset of 6 items focused on the emotional exhaustion dimension of burnout [47, 48]. Items (“My batteries are empty”) are rated on a 1–7 ordered categories scale (1 = Almost never, 7 = Almost always).

  • Anxiety was measured using the GAD-7 questionnaire, a 7-item instrument designed to assess generalized anxiety disorder [49]. This study uses a Swedish translation [50]. Items (e.g. “Feeling nervous, anxious, or on edge”) were scored on a 1–4 ordered categories scale (1 = Not at all, 4 = Nearly every day).

  • Recovery was measured using the Recovery Experience Questionnaire, a 16-item questionnaire designed to measure four dimensions of recovery – detachment, relaxation, autonomy, and mastery – using four items for each dimension [51]. This study uses a Swedish translation [52]. Items (“In my free time I don’t think about work”) are scored on a 1–7 ordered categories scale (1 = Almost never, 7 = Almost always).

  • Mindfulness was measured using the Mindful Attention Awareness Scale, a 15-item measure designed to assess attention and awareness of “what is occurring in the present moment” [53]. This study used a Swedish translation with six items centered around emotional self-awareness [54]. Items (“I could be experiencing some emotion and not be conscious of it until some time later”) are rated on a 1–7 ordered categories scale (1 = Almost never, 7 = Almost always).

  • Stress was measured daily as the mean value of three items inspired by the Stress-Energy Questionnaire [55]. This study used a Swedish translation which has been validated in a previous study by the research team (Lukas J, Kowalski L, Bujacz A: Psychometric properties of the daily measurement of stress in a daily diary study of Swedish Healthcare Workers, in preparation). Items (“During the last day, to what extent have you felt tense / pressed / frustrated?”) were rated on a 6-point ordered categories scale (1 = Not at all, 6 = Very much). This variable was measured daily during the intervention and was not included in the pre-, post-, and follow-up measures.

Results

Data collection procedure

Recruitment & retention

Sixteen participants were recruited out of 90 invited participants, representing a recruitment rate of 17.8%. 15 participants (93.8%) completed the pre-intervention measure, 11 participants (68.8%) completed the post-intervention measure, and 7 participants (43.8%) completed the follow-up measure.

Randomization algorithm

The randomization process generated slightly uneven intervention group sizes (N = 7, N = 5, N = 4). The imperfect distribution was likely due to selecting block sizes which were too large for the number of participants who joined the study. Block sizes need to be adjusted to fit the number of participants in order to ensure an equal group distribution.

Engagement

Protocol adherence

Participants completed on average 14.3 (SD = 8. 01) out of 28 days of the intervention, representing an adherence rate of 51%. There was a large variance in adherence, ranging from 4 – 28 days.

App Engagement Scale

Complete results from the App Engagement Scale (M = 4.36, SD = 0.66) are presented in Table 2.

Table 2 Engagement and acceptability metrics collected at post-intervention measure

Acceptability

Perceived benefit

Table 2 shows results from all single-item measures of perceived benefit. The mean ratings varied between 2.82 – 3.36 on a scale from 1–4 (one item scaled 1–6 had a mean rating of 4.55).

Technical stability

Five out of 11 participants experienced no technical issues at all. Three participants experienced some technical issues, for example being unable able to open questionnaires or enter responses. However, written comments from these participants indicate that these issues were minor and did not cause substantial problems. Unfortunately, data from three participants is missing due to issues with data retrieval.

Measurement quality

Figure 4 shows all outcome measures for all groups across all time-points. Though the purpose of this study was not to evaluate effectiveness of the intervention, the outcomes in this sample moved in a positive direction. Mindfulness and recovery experiences increased over time, while symptoms of emotional exhaustion, anxiety, and exhaustion and disengagement from work decreased across the time-points.

Fig. 4
figure 4

Outcome results for all experimental groups at all measures. Note – The follow-up measure for the social support group is excluded because there was only one participant who completed this measure. The mindfulness scale has been inverted so that higher values indicate more mindfulness

Completion rate

Participants responded to all items in the measures they took part in, providing a 100% completion rate of items in the questionnaires. See Sect.  “Measurement quality” for a discussion on this topic.

Within-person variance

An intercept-only multilevel model with daily stress as outcome had an intraclass correlation coefficient (ICC) of 0.42. The ICC corresponds to the proportion of variability explained by within-group differences compared with between-group differences:

$$ICC=\frac{between-person variance}{between-person variance+within-person variance}$$

Lower values indicate that a larger proportion of variability in the outcome measure is due to within-person differences. An ICC of 0.42 shows that 58% (the inverse of 0.42) of variability is explained by within-person changes over time. This is a substantial part of variability, suggesting that the measure is sensitive to capturing within-person differences over time and also that this dimension is relevant to study intervention effects.

Discussion

Key results

The overall results point to this being a feasible intervention and study design to conduct a full-scale randomized controlled trial. The study indicates a promising recruitment rate though somewhat low retention rates, providing an important guideline for how many participants should be invited to reach a target sample size. Engagement is satisfactory with decent adherence and high app engagement ratings. Acceptability metrics are overall very promising, though the quality of prompts needs improvement. Measurement quality is good overall with a high completion rate and substantial within-person variability. A few adjustments are recommended to further refine the intervention and study protocol before conducting an RCT.

Data collection procedure

The data collection procedure indicated a low retention rate but provided an important benchmark for how many participants need to be invited. The randomization algorithm produced an uneven distribution and needs to be improved.

Recruitment and retention

17.8% of invited participants chose to take part in the study, indicating that roughly 1 in 5 of invited people will join the study. This recruitment rate is reflected in another study population – Kowalski et al., [43] conducted a similar study in which 24% of invited participants were recruited – so recruitment rates around this number seem to be consistent in these kinds of studies. While these may seem like low numbers, it is important to keep in mind that this is a real-world context and not the typical recruitment for clinical trials. Given that we have a wide invitation of participants on a volunteer-basis, we should expect lower recruitment rates than when targeting eligible candidates who will receive compensation.

Importantly, however, retention rates drop off quite markedly, especially for the follow-up measure which was completed by 44% of participants. Given the relatively low retention rate, an improvement in the study protocol would involve mitigating this drop-out effect. This could be done for instance through reminder e-mails and general encouragement to keep participating in the study. Notably, the follow-up retention rate observed in this study may be especially low because the follow-up measure coincided with vacation when participants may have been less inclined to answer.

Finally, rather than viewing recruitment and retention rates only as problems to be solved (though efforts should certainly be made to maximize recruitment and retention), these numbers provide an important benchmark for how many participants need to be invited to reach a target sample size. Based on the results of the current study, we need to recruit 10–12 times as many people as are needed in a final statistical analysis. Given that the planned RCT has wide inclusion criteria as well as a flexible, continuous, and automated recruitment process, it is absolutely feasible to invite sufficiently large numbers of participants.

Randomization algorithm

One problem with the data collection procedure was an uneven distribution of participants in the different intervention groups (N = 7, N = 5, N = 4). The randomized block sizes (3, 6, 9) were too large for the number of participants resulting in an imperfect distribution. It is important for future data analysis that groups are evenly distributed, for which reason we will adjust block sizes accordingly.

Engagement

The intervention had satisfactory adherence and received very high app engagement ratings.

Adherence varied greatly among participants with some using it daily and others using it 4–5 times, averaging a 51% adherence rate (14.3 out of 28 days). Importantly, engagement with digital interventions is a widespread challenge with many studies reporting low adherence rates [21, 56]. Adherence also varies widely depending on the type of intervention, making it difficult to interpret a given rate without sufficient reference to the specific context.

Given the context of this study and project, a 51% adherence rate is deemed satisfactory. Firstly, it is an improvement over a prior version of the intervention which had an average adherence of 39% [43]. In addition, DIARY is an unguided intervention that users engage with wholly on their own terms; instructions even state that one may use the intervention exactly as often as one likes. Compared with guided interventions – for example receiving support from a therapist and having an outspoken treatment plan – unguided interventions typically show lower adherence [57, 58].

Lastly, adherence rates could likely be further improved based on findings from a previous study investigating user engagement with DIARY [43]. For instance, involving employers to encourage intervention use and increasing use intention among participants are additional measures that could be included in the study protocol to increase engagement. Other studies suggest that tailoring and social influence are key factors for promoting engagement, something that may be included in future versions of DIARY [22, 59].

Results from the App Engagement Scale indicate that the intervention and app design are sufficiently user-friendly and engaging to participants. The App Engagement Scale had a mean score of 4.36 out of 5, which is a very positive rating [42]. This questionnaire regards the user experience of the mobile application, asking if users find it easy, enjoyable, and motivating to use. Notably, this score is a substantial improvement over the rating 3.44 observed for a prior version of the intervention using a different digital tool [43].

Acceptability

Participants found the intervention overall acceptable and technically stable, though the prompts need improvement.

Perceived effectiveness

Single-item acceptability metrics indicate that participants were overall satisfied with the intervention and found it suitable. On average, participants found the intervention content “mostly relevant” and would “most likely” want to use such an app again in the future (see Table 2). These are both promising metrics, indicating that the content of the intervention is relevant to this population and that it was sufficiently well-designed and helpful that they would want to access it again.

Some ratings regarding the intervention´s perceived effectiveness were slightly lower: participants did not feel the prompts were very useful to them (2.82 out of 4) and did not wholly experience that the intervention helped them deal with challenges in life more effectively (4.55 out of 6). These results indicate that the prompts may need to be refined to be more helpful to participants. Comments from this study and qualitative data from a previous study on DIARY [43] indicate that the prompts may be too simple and/or repetitive to be optimally beneficial.

One way of improving the prompts would be to base them on a well-established framework outlining a variety of effective strategies for optimizing recovery processes. A problem with the current prompts which became evident during development was that the underlying recovery “type” for each intervention version (social support, psychological strategies, physical activity) was too narrowly defined, resulting in prompts being quite repetitive and one-dimensional. In effect, the same recovery strategy was suggested repeatedly with minor modifications.

Rather than trying to isolate the “best” type of recovery strategy and center a whole intervention around this type of recovery, it may be more fruitful to recommend users a wide range of different recovery strategies. Most models of well-being include multiple components and needs, suggesting that multiple types of strategies may contribute to improving mental health [60, 61]. Including a wide repertoire of recovery strategies may thus be conducive for optimal recovery.

Providing a variety of different types of recovery strategies may be beneficial for other reasons as well. Firstly, it increases the likelihood of users finding a strategy that is possible to implement on a given day and that matches diverse lifestyles. Secondly, recovery may be most effective when it corresponds to current needs because different stressors require different types of recovery to optimally mitigate their negative effects [62]. Relaxation exercises might be helpful to unwind from a cognitively demanding day, while talking to a close friend is more appropriate if one experiences high emotional demands at work. A larger toolbox of recovery strategies makes it more likely that users will find strategies most beneficial to them at any given moment.

One way to include a varied and well-balanced set of recovery strategies grounded on a theoretical foundation would be to craft prompts based on the DRAMMA framework [63]. This framework integrates various models of recovery and well-being, outlining six different types of experiences during leisure time that support mental health: detachment, relaxation, autonomy, mastery, meaning, and affiliation. Interventions using this model have been found to be effective in improving relevant outcomes in a working population [64]. By developing prompts according to a well-rounded framework which includes a large variety of recovery strategies, it is more likely that prompts will be helpful to users and address a wider range of recovery needs.

Technical stability

Results also indicate that the intervention is overall technically stable with several participants not having any technical issues whatsoever. The few reported technical difficulties were very minor and did not cause users substantial issues. This is a clear improvement over a previous iteration of DIARY which had considerable technical problems [43]. Another study using m-Path also found the software to be technically stable with acceptable usability ratings [65]. These results are very promising, but even so, efforts will be made to mitigate any technical issues before future studies.

Measurement quality

Very high completion rates, with a potential caveat. Daily measures show substantial within-person variability. Some outcomes may need to be changed to better answer the research questions.

Completion rate

Participants provided complete data on the measures they participated in, answering all items for all questionnaires in the measures they took part in. Although a 100% completion rate is considered excellent, it may also illuminate potential problems with the data collection procedure. Participants did not have the option of skipping any questions, and responses were not saved on the server until participants completed the entire questionnaire. Because of this only fully completed measures were registered, resulting in a 100% completion rate. It is possible that some participants stopped midway through the measure and so did not have their partial responses registered. The low retention rate may reflect that some participants, even though they partially answered a measure, were not registered as having completed the measure.

Because of the strict criteria for registering data – inability to skip questions and only registering fully completed measures – we may miss out on valuable data. One way to mitigate this issue is by loosening the criteria for collecting data, for instance by giving participants the option to skip questions. Additionally, one can adapt the data collection system so that partial data is registered in the database. This will likely lead to collecting more data, even if it is sometimes incomplete, and may have the added benefit of improving retention rates.

Another important question regarding the measurements regards whether the outcome measures are appropriate to fully understand the intervention effects. Most outcome measures proved relevant, however, based on suggested changes to the intervention, new outcome measures may be more appropriate. The Recovery Experience Questionnaire (REQ) does not capture all dimensions of the DRAMMA framework and may thus not provide information about all different types of recovery strategies (for instance, the dimensions Affiliation and Meaning are missing from this instrument). Instead, the DRAMMA-Q may be a better suited instrument to ensure we get a comprehensive picture of the various recovery strategies [66].

Within-person variability

The daily stress measure proved to be useful for measuring individual change over time, with within-person variability accounting for 58% of the observed variance. This indicates that measuring stress on a daily level is important to capture the experience of participants and may yield important insights into how stress fluctuates on a daily level. These insights can in turn be used to further improve interventions and other efforts to mitigate the negative consequences of stress.

Limitations

A primary limitation of this study is the relatively small and homogenous participant pool. Though a small sample size is common in pilot studies, it is possible that a too restricted sample is not large enough to successfully uncover the full range of potential limitations of the study design. A larger sample is more likely to fully “test” all aspects of the study protocol, ensuring that there are no outstanding issues that will become apparent during a full-scale trial.

Additionally, because all participants were university students recruited from the same location it makes the sample rather different from the target sample, negatively affecting the generalizability of the findings. The full-scale trial intends to include a heterogenous sample with participants from a working population, including different occupations, locations, age groups etc. Because the participant demographics of this study do not reflect the target population of the future trial it is possible that the conclusions drawn from these results lack sufficient external validity. Thus, this study may not accurately predict potential issues that could arise with a different and more heterogenous participant pool.

However, because the educational program is similar to a vocational program with students spending considerable time in real-life working conditions, the sample may meaningfully reflect our target population. In addition, the mean age of 30 represents a key demographic factor given that early-career professionals may be in special need of this kind of intervention [40, 41].

A last study limitation is that there is insufficient data to thoroughly analyze prompt quality and understand how to further improve prompts. A nuanced approach to understanding the prompt quality would involve comparing ratings of individual prompts. However, due to the small sample size in each intervention group and imperfect adherence there is limited data for each prompt rating and, thus, a more in-depth analysis of prompt quality is not statistically possible.

Technical limitations of m-Path include user data being stored on the device rather than in a user account, negatively affecting security and flexibility. Since users do not have an account, they cannot access the intervention or their data from any other device than their own phone. Users also cannot log out from their m-Path profile, so it is not possible to encrypt their data from someone with access to their phone.

Conclusion

The overall results indicate that the study protocol and intervention design, with some modifications, are feasible for conducting a large-scale randomized controlled trial. By changing the way that data is registered in the database, we may collect more data and likely improve retention rates. Block sizes of the randomization algorithm will be adapted to better match the sample size in order to ensure equally sized experimental groups. New prompts will be crafted based on the DRAMMA model to improve the acceptability of the intervention. Some outcome measures will be changed to provide a more comprehensive picture of intervention effects.