Background

The experience sampling method (ESM) is an intensive longitudinal research method [1]. ESM is conducted in real world settings as a participant goes about their daily life [2]. Participants complete self-report questions about transient experiences at multiple times, typically followed by questions relating to current environment or context [3]. Prior to the advent of digital technologies, ESM involved filling in a diary or booklet [4]. Most ESM designs are now computerised and allow researchers to identify the exact time a momentary assessment was completed [5]. The review will focus specifically on digital ESM because paper-based approaches are increasingly redundant and the review focus on data completeness is likely to be strongly influenced by data collection approach.

ESM can provide an accurate assessment of phenomena as they occur [2]. It allows researchers to gain more ecologically valid insights into the impact of daily events on participants, which are difficult to measure under laboratory conditions [6]. ESM can be used to examine temporal precedence between variables [2]. By asking participants to report experiences over a period of time, researchers can investigate fluctuations between variables which may not be captured using other methods [7].

ESM has been used widely in mental health research [8], and a review of its use has identified a number of applications including improving understanding of symptoms and social interactions, identifying causes of symptom variation and evaluating treatments [9]. ESM is a valid approach when capturing mental health states in participants with psychosis [10].

Data completeness is a particular challenge in ESM. Missing data is common in research using ESM methods [11]. Data incompleteness can occur for a number of reasons, such as participants finding ESM burdensome and time consuming [2], leading to reduced adherence to the study protocol, resulting in reduced data quantity [12] and quality [13]. Incomplete data sets can cause important aspects of experience to be overlooked by researchers and also bias statistical models used for analysis [14]. People with psychosis have been shown to be less adherent to ESM study protocols than the general population [13]. Studies that recruit people with psychosis have higher rates of participant withdrawal, resulting in fewer participants included in final analyses [15].

ESM design

Conducting an ESM study involves making several design decisions [16]. For example, deciding when and how frequently participants answer questionnaires. A questionnaire prompt may be sent to participants at pre-defined intervals (time contingent protocol), scheduled at random times (signal contingent protocol) or carried out when a predefined event has occurred (event contingent protocol) [17]. Studies can also use hybrid designs, which combine sampling protocols [9]. Setting the frequency of questionnaire prompts involves consideration of participant burden as well as how rapidly the target phenomenon is expected to vary [4].

There is evidence that design decisions influence completion rates. For example, longer questionnaires have been associated with higher levels of participant burden [18]. Protocol adherence has been shown to reduce over time, and also to be dependent on the time of day a questionnaire is received [13]. A systematic review investigating compliance with study protocols and retention in ESM studies in participants with severe mental illness found that frequent assessments and short intervals between questionnaires reduce data completeness, and increasing participant reimbursement increases data completeness [15].

There is a need for greater consistency in the design of ESM studies [9]. ESM is a collection of methods and is usually reported in relation to general characteristics rather than a defined set of design options [4]. When designing an ESM study, researchers have insufficient evidence on which to base design decisions [18]. Designs of ESM studies are often based on individual research questions [16], leading to a large heterogeneity of designs [15]. Additional methodological research is needed in order for studies to be replicable and standardised [9].

Developing consistency in design is impeded by the absence of a typology of design decisions. No typology for ESM design choices currently exists. A typology could help to define and classify ESM research methods [19], increasing both methodological rigour in developing and reporting individual studies and the ability to compare or combine findings.

Review aims

The aim of this systematic review is to characterise the design choices made in digital ESM studies monitoring the daily lives of people with psychosis. The objectives in relation to ESM studies involving people with psychosis are:

  • (1) to develop a typology of design choices used in digital ESM studies and

  • (2) to synthesise evidence relating data completeness to different ESM design choices.

Methods

A systematic review of the literature was carried out following PRISMA guidance [20]. Studies published in academic journals that met inclusion criteria were assessed for methodological quality. The constant comparative method [21] was used to identify design decisions to produce a typology. Weighted regression was used to identify design decisions that predicted data completeness.

Eligibility criteria

Inclusion criteria:

  • • Participants: papers that reported on participants with a clinical or research diagnosis of psychosis, either as a category or by specific diagnosis, e.g., schizophrenia either as the study population or as a separately reported and disaggregable sub-group

  • • Methods: Studies using ESM to monitor participants with psychosis

  • • Studies which used digital technology to administer ESM

  • • Studies which included experience sampling as part of a wider design, e.g. as part of an intervention

  • • English language full text articles, reviews and conference abstracts

  • • Papers published from January 2009 to July 2021

  • • Studies which either reported the completeness of data or gave sufficient data to allow calculation of data completeness where not specifically reported

Exclusion criteria:

  • • Studies recruiting non clinically diagnosed adults, i.e. participants self-reporting psychosis without clinical or research validation

  • • Studies using non-digital approaches to data collection

  • • Lifelogging, quantified self and other self-tracking approaches used by individuals to record personal data, since these are not research methodologies used to collect data for scientific purposes

Data sources and search strategy

A systematic search was developed and conducted in collaboration with two information specialists with expertise in systematic review searches (authors EY and NT). These data sources and associated search strategy are described below.

Six sources were used.

First, the following electronic databases were searched with a date limit of January 2009 to July 2021 (date of last search): Medline, Embase, PsycInfo (all via Ovid), Cochrane Library, and Web of Science Core Collection. The search terms are described in detail in additional file 1.

Second, the table of contents for the following journals were hand searched: Journal of Medical Internet Research, Journal of Medical Internet Research (Mental health), Journal of Medical Internet Research (mHealth and uHealth), Journal of Methods in Psychiatric Research, Psychiatric Rehabilitation Journal, Psychiatric Services, Psychological Assessment Schizophrenia Bulletin and Schizophrenia Research. Issues from 2009 to July 2021 were searched. These journals were chosen as they regularly published recovery-related papers.

Third, web-based searches were conducted using: Google Scholar, ResearchGate and Academia.edu. They were searched using the terms ‘experience sampling’ and ‘psychosis’, ‘ecological momentary assessment’ and ‘psychosis’, ‘experience sampling’ and ‘schizophrenia’ and ecological momentary assessment and ‘schizophrenia’. Due to the large number of results found on Google Scholar only the first five pages (100 results) per search string were searched.

Fourth, grey literature searches were conducted using OpenGrey. This was conducted using the same search terms used for the web-based searches.

Fifth, reference lists of included papers were hand-searched. Backward citation tracking was conducted by hand-searching the reference lists of all included papers. Forward citation tracking of papers citing included studies was conducted using Scopus and Google Scholar.

Finally, a panel of five experts with expertise in experience sampling methods was consulted for additional studies meeting the inclusion criteria.

Data extraction and appraisal

Eligible citations were collated and uploaded to EndNote, and duplicates were removed. The titles of all identified citations were screened for relevance against the inclusion criteria by ED and FN, who rated all of the studies for inclusion. Data were extracted into an Excel spreadsheet developed as a Data Abstraction Table (DAT) for the review. The complete DAT can be found in (Additional file 2). Full text was obtained for potentially relevant papers and eligibility decided by the lead author.

Quality assessment

In the absence of a typology for reporting of ESM studies, recommended reporting criteria for ambulatory studies [22] were used. Studies were assigned points based on whether they had reported elements of the study design recommended by the guidelines. Examples of recommended reporting criteria include ‘explanation of the rationale for the sampling design’ and ‘full description of the hardware and software used to collect data’. Corresponding to the number of items on the reporting criterion, the maximum possible score was 12 points. Studies scoring 0 to 6 were arbitrarily considered to be low quality, and studies scoring 7 to 12 were considered high quality. This rating was carried out by ED.

Subgroup analysis was undertaken for studies included in Objective 2 (predictors of data completeness). Subgroup analysis was not undertaken for the Objective 1 typology because the aim was to develop an exhaustive typology [23].

Data analysis

To meet Objective 1 (design typology), design decisions were iteratively identified from the included papers. A preliminary typology of design decisions was developed by analysts who were familiar with the field of ESM (ED, MC, MS). This preliminary typology was used as headings in the initial version of the Data Abstraction Table (DAT). The constant comparative method [21] was used to refine the preliminary typology, by combining inductive category coding with simultaneous comparison of incidents observed [24]. Included papers were coded using existing DAT headings, and further or combined categories were iteratively identified [25]. The DAT was then structured using all identified design decisions and corresponding data extracted from each study [26]. Extending and combining of the preliminary typology was achieved through discussion amongst researchers ED, MC, FN, MS.

To meet objective 2, the outcome of data completeness was defined as the percentage of questionnaires completed by participants in each study out of a possible total allowed by each study protocol. This was taken directly from the paper where possible. Where percentage of data completeness was not reported, completeness was calculated by converting the total questionnaires completed during the study into a percentage using the total possible questionnaires allowed by the study protocol. The percentage represents the total data completeness for each study. Each study had a different number of questionnaires to report and different number of participants The percentage therefore includes variation between participants and within participants as questionnaire completion was completed over time. Each of the categories from the typology were used as predictor variables. Additional predictor variables included in the analysis were mean age of study participants and percentage of male participants.

A weighted regression was carried out. This approach assumes that the completeness is a summary statistic from a study with unknown variance and this standard error. Each completeness percentage statistic is weighted by how many participants were in each study. The number of questionnaires (denominator variable) was also included as a predictor to see if this predicted completeness. The completion outcome is analysed as a standard regression but where each estimate of completeness is given weight dependent on its sample size.

Predictor variables were entered into the weighted regression model. Design features not used in any included study, such as 2.1.1 ESM protocol: signal contingent, were excluded. For continuous predictor variables, cut-points were used to produce broadly equal sized categories: Participant gender (0%—32% male, 33%-65% male, 66%-100% male). For each predictor variable, the first category was then used as the reference category and each of the other categories individually and where relevant in grouped combinations were compared with the reference.

A p-value for each predictor was calculated using the ANOVA function comparing the difference in model fit (R squared value) for each predictor and the intercept model (i.e. no variables). Significant predictors were then further explored by comparing beta values in their models with their corresponding p-values. These were reported and tabulated to see the impact on completeness and to explore which differences between categories of the predictor were significantly associated with completeness.

Results

Thirty-eight publications were included in the review. The study selection process is summarised in Fig. 1 using the PRISMA flowchart [27].

Fig. 1
figure 1

Flowchart of study selection

Characteristics of included publications are presented in Table 1.

Table 1 Summary of included papers (n = 38)

Quality assessment

All 38 studies were assessed for quality. Overall, 14 (37%) were evaluated as high quality and 24 (63%) as low quality.

Participants

The 38 included studies recruited a total of 2,722 participants with psychosis. Overall, 51% (n = 1,380) of participants were male. The mean age was 41 years. Other participant demographic variables were reported inconsistently across studies. Data were collected from 2,643 (97%) participants in the community and 79 (3%) were inpatients at the time of data collection. Participants had diagnoses including schizophrenia, spectrum disorder, psychosis, non-affective psychotic disorder, bipolar disorder, schizophreniform disorder, schizo-affective disorder, delusional disorder, or psychotic disorder not otherwise specified (NOS), depression with psychotic symptoms, delusional disorder, first episode psychosis and major depression.

ESM design choices

Design choices are summarised in Table 2. Not all design decisions were reported across all studies.

Table 2 Design decisions used in included studies (n = 38)

Data completeness

Percentage of data completeness was obtained for 29 studies. The remaining nine studies expressed data completeness as either a percentage or number of participants who completed more than a predefined threshold amount, meaning it was not possible to determine the exact data completeness percentage. Data completeness across included studies in summarised in Fig. 2.

Fig. 2
figure 2

Data completeness of included studies

Objective 1: Typology of design choices used in ESM studies

Analysis of included publications identified 24 design decisions. Three superordinate themes were identified from the designs: Study context, ESM approach and ESM implementation.

Superordinate theme 1: Study context

The Study Context theme describes decisions made when designing an ESM study which are not ESM-specific decision. These are shown in Table 3.

Table 3 Superordinate theme 1: Study context

Superordinate theme 2: ESM Approach

ESM approach describes the design decisions relating specifically to experience sampling and are shown in Table 4.

Table 4 Superordinate theme 2: ESM Approach

Superordinate theme 3: ESM Implementation

The theme of ESM implementation is shown in Table 5.

Table 5 Superordinate theme 3: ESM Implementation

Objective 2: Predictors of data completeness

A weighted regression of design decisions included in the typology was conducted, and the significance of each design choice as a predictor of data completeness is shown in Table 6.

Table 6 Weighted regression of design choices as predictors of data completeness (29 studies)

The regression identified six candidate predictors of data completeness: ESM protocol, length of time per measurement, total time in the study, research team contact, accepted response rate and collecting other data. The findings from the weighted regression for specific values of these six candidate predictors are shown in Table 7.

Table 7 Weighted regression for candidate predictors of data completeness (29 studies)

Table 7 shows that using a time contingent protocol rather than a signal contingent protocol was significantly associated with reduced data completeness by around 12%. Greater data collection burden was consistently associated with reduced data completeness: every extra hour in measurement duration reduced data completeness by 2%, every additional day enrolled in the study reduced data completeness by 0.5%, and collecting extra data alongside ESM data reduced data completeness by 19%. Finally, researcher-initiated contact with participants increased data completeness by 17.5% when compared to participant-initiated contact.

Sensitivity analysis

The analysis was repeated only including the 10 studies rated as high quality that expressed the data completeness as a percentage. The quality assessment ratings for studies is shown in Additional file 2. 14 studies were rated as high quality. The quality criteria met by fewest studies was justification of sample size (met by 3 studies) and rationale for the sampling design (met by 5 studies). The weighted regression identified 3 design decisions that predicted of data completeness: sample size (p: 0.012), other data collected (p: 0.006) and hardware used (p: 0.045). The statistically significant predictors with beta values, standard errors and p-values can be found in additional file 3.

Discussion

This systematic review identified design decisions used in experience sampling studies of people with psychosis. The resulting typology identified three superordinate themes relating to design decisions in ESM studies: Study context, ESM approach and ESM implementation. Weighted regression was then used to identify six design decisions that predicted data completeness: ESM protocol, other data collected, length of time in study, measurement duration, accepted response rate and contact with the research team.

Objective 1: Typology of design choices used in digital ESM studies

A systematic search of published literature on ESM allowed the creation of a typology that accurately represent the methods used in the field [65]. The resulting typology can help researchers to choose designs, help establish a common language and help to provide the field of ESM research with organisational structure [66].

Four ESM protocols were included in the typology. Event-contingent assessments, Signal-contingent assessments, Time-contingent assessments, and hybrid assessments. Three ESM protocols are commonly cited in ESM literature [67]. A questionnaire prompt may be sent to participants at pre-defined intervals (time contingent), scheduled at random times (signal contingent) or carried out when a predefined event has occurred (event contingent) [17]. ‘Hybrid assessment’ has been used to describe combined protocols.

A sampling protocol is often selected based on the variables of interest [16]. Choice of protocol may depend on whether the variables are discrete, relating to distinct events such as social interactions, or continuous events with less identifiable parameters, such as mood [4]. Discrete events are well suited to event contingent protocols as they have definable beginning and end points. Rather than waiting for a signal or prompt, participants fill out a questionnaire when a discrete event occurs. Time contingent and signal contingent protocols are better suited to measurement of continuous variables. Participants are not required to identify the beginning or end of a pre-defined event in order to complete a questionnaire in time contingent or signal contingent designs [9].

Signal and time contingent protocols can be carried out at fixed or flexible time points [9]. Some authors described their signal contingent designs as stratified [37, 39, 43] or semi-random [16]. In stratified sampling, questionnaires are sent at random time points within pre-programmed time windows. These parameters are unknown to the participant [58]. An example of this is a protocol with a range of 90 min within which at least one beep occurred with a minimum of 15 min and a maximum of 3 h between each beep. The intention of the stratified protocol is to balance the requirement for collecting variable and valid data with participant burden [43]. The data may be more variable than a time contingent protocol, as the timing of the signal cannot be anticipated by participants. Participants will not therefore be less likely to alter their daily life or habits to incorporate the sampling. Some stratified sampling protocols included personalising the daily measurement period to each participant’s waking hours [39, 58].

Another design choice relating to ESM approach is whether the digital technology used for data collection was provided by the researchers or participants were required to use their own smartphones. The present review found there to be no significant difference in data completeness between studies which provided a device and those in which participants used their own phone. There is conflicting opinion about this in ESM literature [22]. Disadvantages of using participant-owned devices may include increased distractions from other applications on the phone and decreased uniformity of study procedures [68]. Advantages may include reduced study costs and also reduced requirement for participants to meet researchers face to face [69], which could reduce participant burden. A meta-analysis on ESM protocol compliance in substance users found no significant difference in adherence rates for participants who used their own phone compared with participants who used researcher provided devices [70].

The typology identified measures used for ESM studies that were derived from psychometrically validated scales and others that were not validated. Many of the measures which were not validated were created by the research team for the purpose of the study. In ESM studies, researchers have often selected items from longer, validated measures and adapted the questions to fit the study time frame [22]. This is often due to the lack of validated measures available for use in ESM studies [71]. Researchers should consider that adding “right now” to a questionnaire item does not necessarily mean that it is appropriate for measuring momentary states [3]. Measuring momentary experiences is different from measuring phenomenon included in cross-sectional questionnaires that occur generally and retrospectively [9]. When considering what questionnaires to use in ESM studies, researchers should take into account the momentary nature of the phenomena and develop items that accurately capture how they are experienced over the course of the study duration [72].

The typology also identified support offered to participants once data collection has commenced. This is a common method of encouraging protocol adherence [13]. It can take the form of technical support, motivational support, or emotional support.

This study found no significant association between data completeness and reimbursement to participants. However, reimbursement can involve a number of different strategies, including providing added incentives to participants who achieve high levels of protocol adherence, withholding payment if compliance falls below a certain threshold, and providing payment at regular face to face meetings [22]. The value of participant reimbursement has been found to be positively associated with protocol adherence [15]. However, the authors note that they did not consider the strategy used to provide the incentives. Instead, a total incentive was calculated for each study. Another study investigated studies which provided reimbursement proportional to the number of questionnaires completed. No increase in protocol adherence was found [70]. The difference in findings indicate that more research is needed in this area, particularly on the influence of different reimbursement strategies on data completeness.

Applicability to different populations

Design choices included in the typology are consistent with those described in suggested ESM reporting guidelines for research in psychopathology [22]. This suggests that the typology can be applied across different mental health populations. Future research is required to validate the typology for use with transdiagnostic groups. For example, when being used to collect data from participants with depression, or measuring discrete variables such as self-harming behaviours [73]. This suggests that event contingent protocols may be used more frequently with this population [4].

Objective 2: Design decisions that predict data completeness

The ESM protocol used predicted data completeness. Using a signal contingent protocol compared to a time contingent protocol was shown to increase data completeness by around 12%. This is in contrast with previous research which has shown that signal contingent sampling may be perceived as more burdensome by study participants [74] leading to lower levels of adherence compared to other protocols [75]. The authors suggest that higher levels of predictability afforded by time contingent protocols may increase adherence as participants are able to integrate responding to questionnaires into their daily routine [75]. Knowledge of when to expect the questionnaire prompts may allow participants to plan their daily tasks in accordance with the scheduled questionnaires [15].

A study of ESM in participants with substance dependence found that participants may prefer to isolate themselves, or to be in quiet environment when responding to questionnaires [76]. In this case, the additional burden of anticipating the signal at a certain time and finding a quiet environment may account for lower data completeness with a time contingent protocol. Similarly, the psychosis population in our review may have more cognitive impairments such as reduced attention, meaning that the potential for integrating data collection into daily life is reduced, so a signal contingent assessment is easier to provide an immediate response to. As there are advantages and disadvantages to each ESM protocol and inconsistent findings regarding their effects on data completeness, it has been suggested that the choice of protocol should be based on the requirements of the study [15]. This may involve choosing a protocol that is based on the nature of the variables of interest [4].

Design decisions relating to scheduling were found to influence data completeness. Longer study lengths and longer daily measurement duration predicted lower levels of data completeness. For every day participants were enrolled in a study, data completeness reduced by 0.5%. Similarly, every additional hour of measurement duration per day reduced data completeness by 2%. These findings are consistent with previous research. A study analysed predictors of adherence to ESM protocols in a pooled data set of 10 ESM studies. The sample consisted of 1,717 participants, of whom 15% had experienced psychosis. The results showed that ESM protocol adherence declined over the duration of study days [13]. More generally, the problems experienced by people living with psychosis, such as negative symptoms and amotivaton, may require lower burden data collection procedures.

Some studies have customised the time period per day that sampling took place for each participant. This has included personalising the daily measurement period to each participant’s waking hours [29, 34, 36]. Sampling took place for the same number of hours per day for each participant but began and ended at different times. This review only included the total number of hours per day sampling took place in the analysis. Future research could investigate the relationship between personalised scheduling and data completeness. Completeness, to allow diurnal variation in symptomatology to be integrated into the data collection schedule. For example, an individual who is more preoccupied with hallucinatroy experiences in the morning may be more able to respond to data collection prompts later in the day.

Monitoring participants once ESM has commenced has been recommended in order to encourage protocol adherence [22]. Support from researchers during the data collection phase is either initiated by researchers or by participants. Researcher initiated contact with participants throughout the duration of the study increased data completeness by 17.5% compared to participant-initiated contact. These findings support active researcher support once data collection has commenced. This is something which may be particularly beneficial if participants find the study procedures burdensome.

Strengths and limitations

One strength of this study is the rigorous search strategy. This was designed in collaboration with two information specialists with expertise in conducting systematic review searches in the field of mental health. Another strength is the use of several analysists with differing expertise. Areas of expertise include clinical expertise, mental health services research and technology research and design.

Several limitations can be identified. Studies were only included which reported data completeness, or studies where it was possible to calculate this. Studies that did not report this could have been included for Objective 1, which may have increased generalisability of the findings. Similarly, studies were only included if they used digital ESM which may also have limited generalisability. The title, abstract and full paper sifting was only carried out by one author (ED), which may introduce inclusion bias. In the absence of an appropriate quality checking tool, recommended reporting guidelines were used instead, which may not have been fully capturing quality. Finally, a meta regression could not be carried out, and the weighted regression that was conducted instead does not account for given estimates of variance, meaning that conclusions drawn from the analysis need to be interpreted with caution. Additionally, a small number of studies (n = 10) were included in the sensitivity analysis.

Conclusions

The study addresses a knowledge gap related to design decisions for ESM studies recruiting people with psychosis. The typology of design choices used in ESM studies identifies key design decisions to consider when designing and implementing an experience sampling studies. The typology could be used to inform the design of future experience sampling studies in transdiagnostic mental health populations. The review also identifies a number of predictors of data completeness. This knowledge could help future researchers to increase the likelihood of achieving fuller data sets.

Future research might seek to add additional design choices to the typology and to refine design decisions as the field advances. Future research may also examine how the typology is used by researchers when designing ESM studies. Researchers may also validate the typology for use with different mental health populations.