Mood disorders are highly prevalent and present a huge cost for individuals and society in general (Steel et al., 2014; Vigo et al., 2016; Wittchen, 2012). Real-time detection of developing mood disorders would therefore be of great clinical benefit, as it allows for timely intervention to prevent an episode from occurring or to mitigate its severity.

Research investigating whether people are at risk of developing mood disorders has mostly focused on time-invariant characteristics, such as genetics (Bigdeli et al., 2017; Craddock & Forty, 2006; Levinson, 2006) and personality (Klein et al., 2011). An obvious disadvantage of using such time-invariant characteristics, is that the associated risk remains constant over time. Therefore, time-invariant characteristics provide no information on when a mood disorder might actually start to develop and thus when intervention is necessary. To obtain such information, one needs to focus on time-varying characteristics. A prime candidate is emotional experience, as it is inherently dynamic and thus fluctuates over time (Davidson et al., 2000; Frijda, 2007; Kuppens, 2015; Larsen, 2000). Moreover, disruptions of (these fluctuations in) emotional experience are considered to be a core symptom of mood disorders (Houben et al., 2015). Especially disruptions in the mean and variance of emotional experiences appear to be indicative of mood disorders (Dejonckheere et al., 2019).

There is ample evidence that these fluctuations and disruptions of emotional experiences can be captured by means of experience sampling (ESM) data (Myin-Germeys et al., 2009, 2018). In ESM studies, participants are asked to report on their momentary affect several times a day, for multiple days. Next, fluctuations and disruptions of emotional experiences are captured by computing person-specific means and variances of the affective variables, either across the complete ESM period or within multiple smaller time windows (Cabrieto et al., 2019; Schat et al., 2021), yielding time-varying characteristics. Between-person comparisons of the person-specific means and variances show that healthy individuals generally have higher levels of positive affect and lower levels of negative affect than depressed individuals (Dejonckheere et al., 2018; Hollenstein et al., 2013; Houben et al., 2015). Moreover, Dejonckheere et al. (2019) found that the person-specific variance has a unique contribution on top of the mean levels in the prediction of depressive symptoms. Importantly, retrospective within-person comparisons of the time-varying means and variances suggest that emotional fluctuation patterns indeed often change prior to a depressive episode. Specifically, increases in the level of negative affect and decreases in the level of positive affect have been documented, as well as increases in the variance of negative emotions (Cabrieto et al., 2018, 2019; Nelson et al., 2017; Wichers et al., 2016). In a single-subject study by Wichers et al. (2020), the variance of the item ‘feeling down’ increased before a depressive episode, even after the data was detrended.

Retrospective methods (e.g., change point detection methods, moving window techniques) are very useful for determining whether changes indeed occur prior to a depressive episode and in which variables such changes occur. However, from a prevention perspective, retrospective methods are clearly of no use for interventions as the disruptions are detected far too late. Such methods are indeed applied after data collection has finished and can only detect a change if a relatively large number of data points before and after the change is available. As an inevitable consequence, changes are detected too late for prevention purposes. Thus, in order to intervene, there is a need for prospective methods that can detect relevant changes in time-varying characteristics in real-time. Recent work has shown that statistical process control (SPC) procedures are particularly promising (Schat et al., 2021; Smit et al., 2019, 2022; Smit & Snippe, 2022; Snippe et al., 2022).

SPC originates from industry, where the procedures were initially developed to monitor characteristics of production processes. When applying SPC procedures in practice, one implements two distinct data collection and analysis phases (Montgomery, 2009). In phase I, the goal is to capture the natural variability of a process that remains in-control. To this end, a sample of in-control scores is gathered and used to estimate the mean (\({\mu }_{1}\)) and standard deviation (\({\sigma }_{1}\)) of in-control process scores. These estimates are used to compute control limits. In phase II, monitoring of continuously incoming data that might go out-of-control at some point starts. The phase II scores are compared to the control limits, to see whether and when a change occurs. As long as the phase II scores fall within the control limits, the process is considered to be in-control. When a score falls beyond one of the control limits, the process is considered to be out-of-control.Footnote 1 Nowadays, SPC procedures are used in a various domains, including climate change (Hackney et al., 2013), agriculture (Mertens et al., 2008), pharmaceutics (Silva et al., 2017), and health care (Perla et al., 2021; Thor et al., 2007). All these applications almost exclusively focus on mean changes. The exponentially weighted moving average (EWMA) procedure (Roberts, 1959) has been shown to be particularly useful for this purpose as it is quite robust against assumption violations (Borror et al., 1999; Montgomery, 2009; Stoumbos & Reynolds, 2000) and outperforms other procedures in detecting smaller mean changes (Lucas & Saccucci, 1990; Montgomery, 2009; Roberts, 1959).

Recently, the EWMA procedure has been introduced in the field of psychopathology as well, where it was used to detect mean level changes in the ESM data of single persons. Whereas the first studies monitored the scores at the individual measurement occasions (i.e., raw scores; Smit et al., 2019), more recent work focused on day level values (i.e., day averages; Schat et al., 2021; Smit et al., 2022; Smit & Snippe, 2022). Simulation studies showed that there are several benefits to monitoring day averages, among which an increase in the effect size of the mean changes (Schat et al., 2021). Empirical findings on a sample of 41 formerly depressed patients in remission were also promising (Snippe et al., 2022). These patients (gradually) discontinued their antidepressants while providing ESM data for a period of four months. Twenty-two patients eventually relapsed into depression. EWMA analyses of day averages showed that increases in high arousal negative affect and increases in negative repetitive thinking were early signs of recurrence. Increases in high arousal negative affect were found to be the most specific early signs of recurrence, and were detected in ten out of 22 patients (45%) before recurrence and in two out of 19 patients (11%) who stayed in remission. Increases in negative thinking, on the other hand, were found to be the most sensitive early signs of recurrence, and were detected in 18 out of 22 patients (82%) before recurrence and in eight out of 19 patients (42%) who stayed in remission.

Building on these promising results, this paper aims to further develop the EWMA-based SPC toolset for screening for early signs of depression. Since both between- and within-person studies have pinpointed the unique contribution of the variance in the prediction of depression, it makes sense to find ways to additionally monitor the variance. Therefore, we introduce SPC procedures that can detect variance changes in real-time, as well as procedures that can detect both mean and variance changes. Specifically, we inspect and compare two types of EWMA approaches.

First, we investigate whether the tailor-made approach of monitoring day statistics can also be used to scan for changes in variability. This approach implies that one selects and computes a day statistic of variability and monitors mean changes in this statistic using the EWMA procedure. Specifically, a weighted average of the statistic of interest is monitored. Additionally, to focus on both mean and variance changes, the multivariate extension of the EWMA procedure (MEWMA) is applied to both the day averages and a day statistic of variability. Second, we investigate the performance of less frequently used EWMA-type procedures that have been developed to detect a combination of mean and variance changes in the raw scores. When selecting these procedures, we used two criteria: the calculation of the control limits should be readily available in R packages and the procedure should allow for the monitoring of individual observations (i.e., one observed score per time point). This way, we ended up with EWMA-\({S}^{2}\), EWMA-ln(\({S}^{2}\)), and EWMA-\(\overline{X }\)-\({S}^{2}\) (Crowder & Hamilton, 1992; Gan, 1995; Knoth & Schmid, 2002; MacGregor & Harris, 1993; Reynolds & Stoumbos, 2001). We investigate these procedures to see whether the tailor-made approach of monitoring day statistics performs better or at least equally well to existing EWMA-type procedures. The EWMA-\({S}^{2}\) and EWMA-ln(\({S}^{2}\)) procedures monitor a weighted average of the deviation from the in-control phase I mean. Note that although an EWMA style procedure that focuses exclusively on variance changes exists (EWMV; MacGregor & Harris, 1993), we will not consider this procedure as the computation of the control limits is challenging and not available in R.Footnote 2 The EWMA-\(\overline{X }\)-\({S}^{2}\) procedure is a joint monitoring scheme, which considers two separate EWMA charts and looks at when the first out-of-control score occurs in either of the two, while controlling for the type I error by using slightly different control limits. Specifically, this procedure applies the EWMA and EWMA-\({S}^{2}\) procedures to the raw scores, to detect mean and variance changes, respectively. Other joint monitoring schemes have been proposed, which consider a single EWMA-type chart (for an overview see Cheng & Thaga, 2006; Mccracken & Chakraborti, 2013), however, the control limits for these procedures cannot be easily computed. In the Discussion, we touch upon other less restrictive but therefore more involved procedures (i.e., non-parametric procedures).

The remainder of this paper is structured as follows. First, we introduce the two types of approaches for monitoring variance changes, or the combination of mean and variance changes. Next, we report on a simulation study where we evaluate the performance of the different procedures at detecting mean and variance changes. Lastly, we present a discussion of the results and directions for future research.

SPC procedures for detecting mean and variance changes

In this section, we present the two types of SPC approaches to detect mean and variance changes in ESM data. To set the stage, we first discuss a benchmark ESM data set that is often used to illustrate new techniques and briefly recapitulate the key ideas of the EWMA and MEWMA procedures. Next, we focus on the first approach in which the (M)EWMA procedure is applied to day statistics (i.e., averages and measures of variability). We elaborate on the distributions of the day statistics and the monitored scores, since they will impact performance. Finally, we introduce the EWMA-\({S}^{2}\), EWMA-ln(\({S}^{2}\)), and EWMA-\(\overline{X }\)-\({S}^{2}\) procedures that can be applied to the raw scores.

ESM data

The ESM data were provided by a 57-year-old male with a history of major depressive disorder, who had been using antidepressants for the previous 8.5 years (Groot, 2010; Wichers et al., 2016). The participant took part in a 239-day ESM study, in which he underwent a dose reduction of the antidepressant venlafaxine. The experiment consisted of three parts: a baseline period (4 weeks), a double-blind period with the dose reduction (14 weeks), and a follow-up period (16 weeks). Over a period of 8 weeks within the double-blind period, the antidepressant dose was reduced from 150 to 0 mg. Specifically, the dose reduction started on day 42 and ended on day 98, to which both the researchers and the participant were blind. Around day 127, changes in the participant’s depressive symptoms were observed and he relapsed into depression. During the entire study, the participant was asked to report on his momentary affective states ten times a day; during the night no data were obtained.

We focus on the negative affective state ‘restless’ (Smit et al., 2019), which was measured on a scale from 1 (not) to 7 (very), as it is hypothesized to change long before an actual relapse. Figures 1a and 1b show the raw scores of ‘restless’ and the associated boxplot, for phase I and phase II separately. Given the setup of the experiment, we used the ESM data of the first 41 days (i.e., measurement occasions 1-279) as the phase I data and the data of the remaining days (i.e., measurement occasions 280-1473) as the phase II data. We see that the distribution of ‘restless’ is right skewed. The average of ‘restless’ increases from 1.56 in phase I to 2.15 in phase II. Similarly, the variance of ‘restless’ in phase I and II equals .51 and .88, respectively.

Fig. 1
figure 1

Raw scores of the affective state ‘restless’ during a 239-day antidepressant reduction ESM study. A Raw scores of the affective state ‘restless’ at the individual measurement occasions. The varying background shading indicates the experimental periods. The start (day 42, measurement occasion 280) and end (day 98, measurement occasion 666) of the antidepressant dose reduction scheme are indicated by the first and second dashed vertical lines. The relapse into depression (day 127, measurement occasion 823) is indicated by the third dashed vertical line. B Boxplots of the scores of ‘restless’, for phase I (i.e., days 1-41) and phase II (i.e., from day 42 onwards)

EWMA and MEWMA procedures

EWMA procedure

We start by explaining the computation of the EWMA scores that are monitored in phase II. Afterwards, we discuss how the data in phase I are used to obtain the control limits needed in phase II.

Phase II: Monitoring

The EWMA procedure (Roberts, 1959) combines past information with current information, by monitoring a weighted average of the current score and all past scores. The current score receives the highest weight, and the weights of the past scores decrease exponentially over time. Specifically, the EWMA procedure computes the exponentially weighted moving average \({z}_{i}\) at each measurement occasion \(i\), where \(i\) ranges from 1 to \(t\):

$${z}_{i}=\lambda {x}_{i}+\left(1-\lambda \right){z}_{i-1} ,$$

where \({x}_{i}\) denotes the observed score, and \({z}_{i-1}\) refers to the EWMA score at the previous measurement occasion. \({z}_{0}\) equals the estimated mean \({\widehat{\mu }}_{1}\) based on the phase I data. The constant \(\lambda\) \((0<\lambda \le 1)\) denotes the weight given to \({x}_{i}\). Lower values of \(\lambda\) are typically recommended for detecting small mean changes. Values between .05 and .15 have been shown to work well with ESM data (Schat et al., 2021; Smit et al., 2022; Snippe et al., 2022).

Phase I: Parameter estimation and control limits

Using the phase I data, we obtain estimates of the mean (\({\widehat{\mu }}_{1}\)) and standard deviation (\({\widehat{\sigma }}_{1}\)) of in-control data. These estimates are used to calculate the symmetric upper and lower EWMA control limits:

$$UCL={\widehat{\mu }}_{1}+L{\widehat{\sigma }}_{1}\sqrt{\frac{\lambda }{\left(2-\lambda \right)}}$$

and

$$LCL={\widehat{\mu }}_{1}-L{\widehat{\sigma }}_{1}\sqrt{\frac{\lambda }{\left(2-\lambda \right)}} .$$

The parameter \(L\) determines the width of the control limits. In the Section Assessing (M)EWMA performance, we explain how to choose the \(L\) parameter to obtain a desired SPC performance for data that remain in-control.

MEWMA procedure

If one is interested in monitoring more than one variable simultaneously, the multivariate extension of the EWMA procedure can be used (MEWMA; Lowry et al., 1992).

Phase II: Monitoring

MEWMA computes the multivariate exponentially weighted moving averages \({{\varvec{z}}}_{i}\) at each measurement occasion \(i\):

$${{\varvec{z}}}_{i}=\lambda {{\varvec{x}}}_{i}+\left(1-\lambda \right){{\varvec{z}}}_{i-1}.$$

\({{\varvec{x}}}_{i}\) is the vector that contains the scores of the \(p\) tracked variables at measurement occasion \(i\). The starting vector \({{\varvec{z}}}_{0}\) is set to the phase I averages \({\widehat{{\varvec{\mu}}}}_{1}\). The multivariate scores are then transformed into a univariate score \({T}_{i}^{2}\) , by computing the deviation between \({{\varvec{z}}}_{i}\) and \({\widehat{{\varvec{\mu}}}}_{1}\), while accounting for the linear dependencies between the monitored variables with the covariance matrix:

$${T}_{i}^{2}={\left({{\varvec{z}}}_{i}-{\widehat{{\varvec{\mu}}}}_{1}\right)}^{^{\prime}}{{\varvec{\Sigma}}}_{{{\varvec{z}}}_{i}}^{-1}\left({{\varvec{z}}}_{i}-{\widehat{{\varvec{\mu}}}}_{1}\right).$$

\({{\varvec{\Sigma}}}_{{{\varvec{z}}}_{i}}\) denotes the covariance matrix at measurement occasion \(i\), calculated as:

$${{\varvec{\Sigma}}}_{{{\varvec{z}}}_{i}}=\frac{\lambda }{2-\lambda }\left[1-{\left(1-\lambda \right)}^{2i}\right]{\widehat{{\varvec{\Sigma}}}}_{1},$$

where \({\widehat{{\varvec{\Sigma}}}}_{1}\) is the estimated covariance matrix of the phase I data. The covariance matrix, which reflects linear dependencies, influences the speed at which changes are detected. A change is detected faster (i.e., the \({T}_{i}^{2}\) score is larger) when two independent processes deviate in the same direction from the phase I averages, as compared to a similar deviation in two strongly dependent processes. A change in two strongly negatively correlated processes is detected even faster, when they deviate in the same direction (for an example, see Schat et al., 2021).

Phase I: Parameter estimation and control limits

There is no easy expression for computing the MEWMA control limit. We obtained it using the mewma.crit function of the spc package in R (Knoth, 2017, 2020).Footnote 3

Assessing (M)EWMA performance

The performance of SPC procedures is usually assessed in terms of the run length (\(RL\)), which indicates the measurement occasion at which the first out-of-control score in phase II is detected (Montgomery, 2009). The \(RL\) can vary a great deal across replicates from the same (changing) process. Therefore, the average run length (\(ARL\)) is usually reported, where we make a distinction between the in-control \({ARL}_{0}\) and the out-of-control \({ARL}_{1}\). The \({ARL}_{0}\) denotes the average run length given that the process remains in-control in phase II. Ideally, this value is high, as any out-of-control score is a false alarm. The \({ARL}_{1}\) on the other hand, denotes the average run length given that the process goes out-of-control at the start of phase II. This value should preferably be low, as it indicates the power of the SPC procedure at detecting a change. In the EWMA procedure, the \(L\) value in the control limits is related to the \(ARL\) values. For a fixed \(\lambda\) value, a higher \(L\) leads to wider control limits and thus a higher expected \({ARL}_{0}\) and \({ARL}_{1}.\) The \(L\) value can be obtained using the xewma.crit function of the spc package in R (Knoth, 2020).

The \({ARL}_{0}\) value needs to be chosen prior to obtaining the control limits in phase I. A commonly chosen value for the \({ARL}_{0}\) that we also use in this paper is 370. However, a higher or lower \({ARL}_{0}\) value may be more useful in some situations. The cost of an intervention following an out-of-control score should for instance be taken into account when deciding on a suitable \({ARL}_{0}\) value. If the cost of intervention is low, a low \({ARL}_{0}\) value can be chosen to ease the detection of changes. On the other hand, if the cost of intervention is high, a high \({ARL}_{0}\) value can be chosen to limit the number of false alarms and thus unnecessary interventions.

(M)EWMA assumptions

Like all SPC procedures, the EWMA procedure is based on a number of assumptions. The expected \({ARL}_{0}\) (e.g., 370) will only be obtained if all these assumptions are adhered to, which is unlikely when using ESM data. First, it is assumed that the mean and standard deviation of the in-control process are known. This is almost never the case in ESM research, and explains why we obtain estimates for the mean and standard deviation (i.e., \({\widehat{\mu }}_{1}\) and \({\widehat{\sigma }}_{1}\)) based on the phase I data and use these estimates to compute control limits. The more phase I data, the more accurate the estimates and thus the control limits will be (Jensen et al., 2006; Saleh et al., 2015). Second, it is assumed that the raw scores are normally distributed. Whereas positive affect items are often rather normally distributed, negative affect items are typically strongly positively skewed (Heininga et al., 2019), as we also saw for our example data (see Fig. 1). Although non-normal distributions generally reduce the performance of SPC procedures in terms of false alarms and power, this is less the case for the EWMA and MEWMA procedures if one uses a small \(\lambda\) value (i.e., between .05 and .10 for EWMA and between .02 and .05 for MEWMA), as this smooths away deviations from normality (Borror et al., 1999; Schat et al., 2021; Stoumbos & Reynolds, 2000; Stoumbos & Sullivan, 2002; Testik et al., 2003). Third, data are assumed to be independent. Observed ESM scores, however, are usually serially dependent rather than independent (Houben et al., 2015; Kuppens et al., 2010). Low levels of autocorrelation can already lead to sub-optimal control limits, which in turn influences the performance of SPC procedures (for more information, see Montgomery, 2009; Schat et al., 2021). Further complicating the impact of serial dependence, the observed ESM scores are usually not equidistant, as participants do not report on their affective states during the night and may miss measurement occasions during the day.

To deal with the second and third assumption violations, Schat et al. (2021) proposed to compute and monitor day averages rather than track the raw scores. This was shown to be an efficient way to decrease autocorrelation and handle missingness, and renders the data less skewed. As an additional benefit, the effect size of mean changes increases as within day fluctuations are averaged out, increasing the power to detect small mean changes.

Approach 1: Apply (M)EWMA to detect mean changes in various day statistics

The first approach builds on the method of Schat et al. (2021), which we will refer to as the day-\(\overline{x }\) method. For the purpose of detecting variance changes, we apply EWMA to one of the three following day variability statisticsFootnote 4: day variances (day-\({s}^{2}\)), day standard deviations (day-\(s\)) and the natural logarithm of the day standard deviations (day-ln(\(s\))). These three statistics are commonly used in SPC literature for detecting variance changes (see e.g., Zwetsloot & Ajadi, 2019). To screen for mean and variance changes simultaneously, we propose to apply the MEWMA procedure to the day averages and one of the day variability statistics. The resulting approaches will be referred to as day-\(\overline{x }\)-\({s}^{2}\), day-\(\overline{x }\)-\(s\) and day-\(\overline{x }\)-ln(\(s\)).

Application of the (M)EWMA procedures to day statistics of the ESM data

We will illustrate the implications of these approaches by applying them to the ‘restless’ scores introduced in Section ESM data (see also Fig. 1). To obtain relatively reliable estimates of day variability, we only selected the days with five or more measurement occasions.Footnote 5 This resulted in 35 phase I days and 159 phase II days, and the transition to depression occurred on day 111 (i.e., day 76 in phase II). The R code for applying the SPC procedures to the ESM data are available on OSF at https://osf.io/y9ncq/. An additional example with simulated data can also be found on OSF.

Distributions of the day statistics

Given that the performance of EWMA and MEWMA is (slightly) impacted by the distribution of the monitored scores (see Section (M)EWMA assumptions), we inspect the distributions of the three day statistics of variability.Footnote 6 Figures 2a-c show the boxplots of the day-\({s}^{2}\), day-\(s\) and day-ln(\(s\)) scores, respectively, for phase I and phase II. For all three day statistics of variability, the scores in phase II are higher than those in phase I. The day-\({s}^{2}\) distribution is positively skewed (Fig. 2a). This makes sense if one realizes that in case the raw scores would have been independent, the day-\({s}^{2}\) distribution would be expected to resemble a chi-square distribution with \(n\) - 1 degrees of freedom: \(\frac{(n-1){s}^{2}}{{\sigma }^{2}} \sim {\chi }_{n-1}^{2}\), where \(n\) equals the number of measurement occasions per day. The day-\(s\) distribution is slightly less skewed (Fig. 2b). Indeed, the day-\(s\) distribution approaches a chi distribution with \(n\) – 1 degrees of freedom: \(\frac{\sqrt{n-1}s}{\sigma } \sim {\chi }_{n-1}\), which is less positively skewed than the chi-square distribution. Due to the natural logarithm transformation, the day-ln(\(s\)) distributions are relatively normally distributed, in both phase I and II (Fig. 2c).

Fig. 2
figure 2

Boxplots of the day statistics of variability for phase I and II. Boxplots of A the day-\({s}^{2}\) scores, B the day-\(s\) scores, and C the day-ln(\(s\)) scores

Results

Figure 3 shows the control charts that are obtained when applying the EWMA and MEWMA procedures to the day-\(\overline{x }\), day-\({s}^{2}\), day-\(s\) and day-ln(\(s\)) scores of ‘restless’. The red dots indicate the days that are flagged as out-of-control. All four EWMA charts show a clear trend indicating that the affective process goes out-of-control, which starts well before the transition to depression on day 76. The first out-of-control score of the EWMA day-\(\overline{x }\) chart is on day 10. The process briefly goes back in-control twice, after which the process consistently remains out-of-control. The findings for EWMA day-\({s}^{2}\) and EWMA day-ln(\(s\)) are similar, with a first out-of-control on day 11 and 10, respectively. The EWMA day-\(s\) flags the process as out-of-control slightly later, namely on day 15. Unlike for the mean, however, the control charts monitoring variability go back in-control more often around and after the transition. For the EWMA day-ln(\(s\)), there are also quite some in-control scores before the transition. The three MEWMA control charts show a similar out-of-control pattern as EWMA day-\(\overline{x }\). Aside from a first out-of-control score on day 1 for MEWMA day-\(\overline{x }\)-ln(\(s\)), the process goes out-of-control for all three MEWMA procedures on day 10. After a small period back in-control, the process consistently remains out-of-control.

Fig. 3
figure 3

Control charts of the day statistics of ‘restless’. EWMA procedure applied to A day-\(\overline{x }\), B day-\({s}^{2}\), C day-\(s\), and D day-ln(\(s\)). MEWMA procedure applied to E day-\(\overline{x }\) and day-\({s}^{2}\), F day-\(\overline{x }\) and day-\(s\), and G day-\(\overline{x }\) and day-ln(\(s\)). The \({T}^{2}\) values of the MEWMA procedure on the y-axis are shown on a logarithmic scale. The dashed vertical line indicates the day of relapse (day 76 in phase II). The dashed horizontal lines indicate the UCL and LCL. The solid horizontal line indicates the center line (CL). The red dots indicate the out-of-control days that fall beyond the control limits

Approach 2: EWMA-type variants for detecting variability changes in the raw scores

The second approach consists of applying an EWMA-type procedure to the raw scores themselves, to detect variance changes or both mean and variance changes. Here we focus on three proceduresFootnote 7: EWMA-\({S}^{2}\) and EWMA-ln(\({S}^{2}\)), developed to monitor the variance, and EWMA-\(\overline{X }\)-\({S}^{2}\), developed to jointly monitor the mean and variance. In this section, we start by focusing on the EWMA-\({S}^{2}\) and EWMA-ln(\({S}^{2}\)) procedures, where we first explain the scores monitored in phase II, before turning to the parameter estimation and calculation of the control limits in phase I. Next, we turn to the EWMA-\(\overline{X }\)-\({S}^{2}\) procedure. Finally, we apply the three procedures to the ESM data.

EWMA-\({S}^{2}\) and EWMA-ln(\({S}^{2}\)) procedures

Phase II: Monitoring

The EWMA-\({S}^{2}\) (Knoth, 2005; MacGregor & Harris, 1993) and EWMA-ln(\({S}^{2}\)) (Crowder & Hamilton, 1992) procedures are not only sensitive to variance changes, but also to mean changes, as can be derived from the exponentially weighted scores. The EWMA-\({S}^{2}\) computes the score \({w}_{{S}^{2},i}\) at each measurement occasion \(i\), accounting for the squared deviation of the raw scores from the estimated phase I mean:

$${w}_{{S}^{2},i}=\lambda {\left({x}_{i}-{\widehat{\mu }}_{1}\right)}^{2}+\left(1-\lambda \right){w}_{{S}^{2},i-1}.$$

\({w}_{{S}^{2},0}\) is set to \({\widehat{\sigma }}_{1}^{2}\). We indeed see that if \({\widehat{\mu }}_{1}\) does not equal \({\widehat{\mu }}_{2}\), this will impact the size of the squared differences. The EWMA-ln(\({S}^{2}\)) procedure is very similar to EWMA-\({S}^{2}\), but uses the natural logarithm of the squared differences between the raw scores and the phase I mean:

$${w}_{ln({S}^{2}),i}=\lambda {\mathrm{ln}[\left({x}_{i}-{\widehat{\mu }}_{1}\right)}^{2}]+\left(1-\lambda \right){w}_{ln({S}^{2}),i-1}.$$

\({w}_{ln({S}^{2}),0}\) is set to –1.270457, which is the expected in-controlFootnote 8 mean level of \({w}_{ln({S}^{2})}\).

Phase I: Parameter estimation and control limits

The control limits for both EWMA-\({S}^{2}\) and EWMA-ln(\({S}^{2}\)) can be obtained for a given \({ARL}_{0}\) and \(\lambda\) value using the sewma.crit and lns2ewma.crit functions of the spc package (Knoth, 2005, 2020). We use the ‘unbiased’ mode, which gives asymmetric control limits to account for the distributional deviations from normality of the EWMA-\({S}^{2}\) and EWMA-ln(\({S}^{2}\)) scores. The user also needs to specify the degrees of freedom which boils down to the number of scores, in our case one, per measurement occasion. Finally, the sewma.crit and lns2ewma.crit functions assume that \({\mu }_{1}\) = 0 and \({\sigma }_{1}\) = 1. We therefore standardize the phase I data and apply the implied transformation to the phase II data as well.

EWMA-\(\overline{X }\)-\({S}^{2}\) procedure

Phase II: Monitoring

The EWMA-\(\overline{X }\)-\({S}^{2}\) procedure (Gan, 1995; Knoth & Schmid, 2002; Reynolds & Stoumbos, 2001) is a joint monitoring scheme, which considers two separate EWMA charts and looks at when the first out-of-control score occurs in either of the two. Specifically, the EWMA and EWMA-\({S}^{2}\) procedures are applied to the raw scores, which focus on detecting changes in the process’ mean and variance level, respectively.

Phase I: Parameter estimation and control limits

The control limits for the EWMA-\(\overline{X }\)-\({S}^{2}\) procedure can be obtained for a given \({ARL}_{0}\) and \(\lambda\) value using the xsewma.crit function of the spc package (Knoth, 2007, 2020). This function provides the \(L\) value to calculate the control limits of the EWMA procedure, and the UCL and LCL values for the EWMA-\({S}^{2}\) procedure. We again use the ‘unbiased mode’, set the degrees of freedom to one and use the transformed data. Note that the thus obtained control limits will be more strict than those for EWMA or EWMA-\({S}^{2}\) only, to keep the overall type I error under control.

Application of the EWMA-\(S^2\), EWMA-ln(\(S^2\)), and EWMA-\(\overline{X }\)-\(S^2\) procedures to the ESM data

Figure 4a shows boxplots of the EWMA, EWMA-\({S}^{2}\) and EWMA-ln(\({S}^{2}\)) scores when applied to the raw scores of ‘restless’, with \(\lambda\) = .10. To make it comparable to the (M)EWMA procedure applied to the day statistics, we again only used the days in which there were are least five measurement occasions (see Figs. 2 and 3), with 262 phase I and 1,066 phase II measurement occasions. The distribution of the EWMA-\({S}^{2}\) scores is positively skewed, whereas the distribution of the EWMA-ln(\({S}^{2}\)) scores is less skewed due to the natural logarithm transformation.

Fig. 4
figure 4

EWMA-\({S}^{2}\), EWMA-ln(\({S}^{2}\)), and EWMA-\(\overline{X }\)-\({S}^{2}\) scores and control charts of ‘restless’. A Boxplots of the EWMA, EWMA-\({S}^{2}\), and EWMA-ln(\({S}^{2}\)) scores, based on the phase II individual measurement occasions of ‘restless’. B EWMA-\({S}^{2}\), C EWMA-ln(\({S}^{2}\)), and D EWMA-\(\overline{X }\)-\({S}^{2}\) control charts of ‘restless’, where EWMA-\(\overline{X }\)-\({S}^{2}\) consists of two control charts: EWMA and EWMA-\({S}^{2}\). The dashed vertical line indicates the middle measurement occasion on the day of relapse (measurement occasion 516 in phase II). The dashed horizontal lines indicate the UCL and LCL. The solid horizontal line indicates the center line (CL). The red dots indicate the out-of-control measurement occasions that fall beyond the control limits. The orange dots indicate the scores that were out-of-control in the EWMA-\({S}^{2}\) chart, but not in the same part of the EWMA-\(\overline{X }\)-\({S}^{2}\) procedure

Figures 4b-d show the control charts that are obtained when applying the EWMA-\({S}^{2}\), EWMA-ln(\({S}^{2}\)), and EWMA-\(\overline{X }\)-\({S}^{2}\) procedures, respectively, to ‘restless’. We set the \({ARL}_{0}\) to 370 times the average number of measurement occasions per day in phase I (i.e., 370 * 7.485714 = 2,770), after selecting only the days with at least five measurement occasions. This was done to have a comparable \({ARL}_{0}\) value at the day level. Similar to the (M)EWMA on the day statistics, the EWMA-\({S}^{2}\), EWMA-ln(\({S}^{2}\)), and (the two parts of the) EWMA-\(\overline{X }\)-\({S}^{2}\) control charts show a trend indicating that the process goes out-of-control. For both EWMA-\({S}^{2}\) and EWMA-ln(\({S}^{2}\)), the first out-of-control score occurs at measurement occasion 66 (i.e., day 10), which is similar to the results for approach 1 and well before the transition to depression on day 76 in phase II (i.e., measurement occasions 514-519). For EWMA-\(\overline{X }\)-\({S}^{2}\) the first out-of-control score also occurs on day 10. Specifically, this change is flagged by the EWMA chart at measurement occasion 67, whereas for EWMA-\({S}^{2}\) the first out-of-control score occurs at measurement occasion 68. The process goes back within the control limits multiple times in all control charts.

Simulation study

We conducted a simulation study to evaluate and compare the performance of the ten proposed SPC procedures --EWMA day-\(\overline{x }\), EWMA day-\({s}^{2}\), EWMA day-\(s\), EWMA day-ln(\(s\)), MEWMA day-\(\overline{x }\)-\({s}^{2}\), MEWMA day-\(\overline{x }\)-\(s\), MEWMA day-\(\overline{x }\)-ln(\(s\)), EWMA-\({S}^{2}\), EWMA-ln(\({S}^{2}\)), and EWMA-\(\overline{X }\)-\({S}^{2}\) -- in detecting mean, variance, or both types of changes. We evaluated both the sensitivity and the specificity of the procedures in detecting changes of different sizes by computing \(ARL\) values. As EWMA day-\(\overline{x }\) is sensitive to mean changes, we expect this procedure to be good at detecting such changes. Similarly, EWMA day-\({s}^{2}\), EWMA day-\(s,\) and EWMA day-ln(\(s\)) focus on variability and thus we expect these procedures to be useful at detecting variance changes. As EWMA-\({S}^{2}\), EWMA-ln(\({S}^{2}\)), EWMA-\(\overline{X }\)-\({S}^{2}\) and the three MEWMA variations are all sensitive to both mean and variance changes, we expect these procedures to be able to detect both types of changes. However, we also expect the \({ARL}_{0}\) of the MEWMA procedures to be suboptimal, as this multivariate procedure requires a good estimate of the covariance structure based on the phase I data, which may not provide a sufficient amount of data to this end.

The R code to reproduce the simulation study is available at https://osf.io/y9ncq/. We also varied additional parameters (i.e., underlying autocorrelation, number of days in phase I, \(\lambda\) value, number of measurement occasions), which we do not discuss here due to their rather limited influence on the results.Footnote 9 However, they can be consulted at https://osf.io/y9ncq/.

Design

Data characteristics

We looked at mean changes of 0, .50 and 1\(\sigma\). Simulation results indicated that the direction of the mean change did not matter, and therefore negative mean changes (i.e., -.50 and -1\(\sigma\)) will not be discussed here (see https://osf.io/y9ncq/). We considered standard deviation changes of -80%, -40%, 0, 40% and 80%, which corresponded to \({\sigma }_{2}\) = .20\({\sigma }_{1}\), .60\({\sigma }_{1}\), \({\sigma }_{1}\), 1.4\({\sigma }_{1}\) and 1.8\({\sigma }_{1}\). The mean and standard deviation changes were introduced at the start of phase II. For each possible combination of a mean change and a standard deviation change, 10,000 replicates were generated. Specifically, the phase I scores were sampled from a standard normal distribution and the phase II scores from a normal distribution of which the \({\mu }_{2}\) and \({\sigma }_{2}^{2}\) values were determined by the size of the mean and standard deviation change. We set the number of measurement occasions per day to 10, and the number of phase I days to 100. For computational reasons, we set the number of days in phase II to 10,000 (i.e., number of observed phase II scores equals 10,000 times the number of measurement occasions).

Settings of the procedures

Each of the 150,000 data sets was analyzed with all ten SPC procedures. For all procedures, we set \(\lambda\) to .10. Given this \(\lambda\) value, the control limits were set such that the \({ARL}_{0}\) equals 370 for EWMA and MEWMA. For EWMA-\({S}^{2}\), EWMA-ln(\({S}^{2}\)), and EWMA-\(\overline{X }\)-\({S}^{2}\), we set the control limits such that the \({ARL}_{0}\) equals 370 * 10 (i.e., times the number of measurement occasions per day), to have comparable values at the day level.

Next, we assessed the run length of each analysis (i.e., on which phase II day did the first out-of-control occur). The run length can be quite long due to the positive skewness of the run length distribution (Schat et al., 2021). In case there was no out-of-control score within the 10,000 phase II days, we set the run length to 10,001.

Performance measures

The performance of the SPC procedures was measured in terms of the \(ARL\). We expect this \(ARL\) to depend on the combination of induced changes and the applied procedure. For instance, for EWMA day-\(\overline{x }\), we predict an \(ARL\) of 370 for every design cell that does not imply a mean change and expect that the \(ARL\) decreases with increasing mean changes. Similarly, for EWMA day-\({s}^{2}\), EWMA day-\(s\), and EWMA day-ln(\(s\)) we expect \(ARL\) values of 370 for every design cell that does not imply a variance change and hypothesize that these values will decrease with larger variance changes.

Results

Figure 5 shows the \(ARL\) curves of the different SPC procedures, for varying sizes of mean and standard deviation changes.

Fig. 5
figure 5

\(ARL\) curves of the SPC procedures. \(ARL\) curves of A EWMA day-\(\overline{x }\), B EWMA day-\({s}^{2}\), EWMA day-\(s\) and EWMA day-ln(\(s\)), C MEWMA day-\(\overline{x }\)-\({s}^{2}\), MEWMA day-\(\overline{x }\)-\(s\), and MEWMA day-\(\overline{x }\)-ln(\(s\)), and D EWMA-\({S}^{2}\), EWMA-ln(\({S}^{2}\)), and EWMA-\(\overline{X }\)-\({S}^{2}\). The columns indicate the size of the mean change. The \(ARL\) values are shown on a logarithmic scale and the horizontal black line shows the nominal \({ARL}_{0}\) value of 370

EWMA day- \(\overline{x }\)

For EWMA day-\(\overline{x }\), the \({ARL}_{0}\) value (i.e., no mean and variance change) equals 311.8. There is a clear effect of the size of the mean change: the larger the change, the lower the \(ARL\) value (Fig. 5a). The detection rate is strongly driven by the mean changes, as additional variance changes (decrease or increase) have little impact. This is reflected by the horizontal lines for the mean changes .50 and 1\(\sigma\). However, in case there is no mean change, EWMA day-\(\overline{x }\) does detect variance changes, suggesting that this approach is sensitive to both mean and variance changes. For positive variance changes (i.e., increase), the associated \(ARL\) values are lower than the \({ARL}_{0}\) value. For negative variance changes, on the other hand, the \(ARL\) values are higher than the \({ARL}_{0}\) value. This makes sense, as it becomes more difficult for the EWMA day-\(\overline{x }\) scores to exceed the control limits due to the decrease in variance.

EWMA day- \(s^2\) , EWMA day- \(s\) and EWMA day-ln( \(s\) )

The \({ARL}_{0}\) values for these three procedures equal 329.8, 309.0, and 308.0, respectively. For EWMA day-\({s}^{2}\), EWMA day-\(s\) and EWMA day-ln(\(s\)) there is a clear effect of the size of the variance change: the larger the change, the lower the \(ARL\) value (Fig. 5b). All three procedures are not sensitive to mean changes, as the \(ARL\) values remain equal to the \({ARL}_{0}\) values even if a mean change occurs. Overall, EWMA day-\(s\) has relatively symmetric \(ARL\) curves, indicating that this procedure performs equally in detecting positive (i.e., increase) and negative (i.e., decrease) variance changes. EWMA day-\({s}^{2}\) has asymmetric \(ARL\) curves, which are steeper for positive variance changes, suggesting that this procedure is better at detecting increases in variance. EWMA day-ln(\(s\)), on the other hand, has a steeper \(ARL\) curve for negative variance changes, suggesting that it is better at detecting decreases in variance. The (a)symmetry of these \(ARL\) curves are related to the distribution of the EWMA scores for the different sizes of variance change. Figure 6 shows an example of EWMA day-\({s}^{2}\), EWMA day-\(s\) and EWMA day-ln(\(s\)) scores for the different variance changes (without a mean change). We see that the variance changes impact the location and width of the boxplots in non-linear ways for the EWMA day-\({s}^{2}\) and EWMA day-ln(\(s\)) procedures, explaining why we see more out-of-control scores for certain settings.

Fig. 6
figure 6

Examples of EWMA scores of the day statistics of variability. Boxplots of the A EWMA day-\({s}^{2}\) scores, B EWMA day-\(s\) scores, and C EWMA day-ln(\(s\)) scores. The dashed horizontal lines indicate the LCL and UCL for each procedure

MEWMA

The \({ARL}_{0}\) values of MEWMA day-\(\overline{x }\)-\({s}^{2}\), MEWMA day-\(\overline{x }\)-\(s\) and MEWMA day-\(\overline{x }\)-ln(\(s\)) are 218.7, 224.8, and 222.5, respectively (Fig. 5c). These are considerably lower than the \({ARL}_{0}\) values of the other procedures, which was expected as multivariate procedures require an accurate estimate of the covariance matrix and therefore more phase I data. The three procedures are sensitive to both mean and variance changes, where the \(ARL\) values are lower for the larger changes. The asymmetry of the MEWMA day-\(\overline{x }\)-\({s}^{2}\) and MEWMA day-\(\overline{x }\)-ln(\(s\)) curves are in line with the asymmetry discussed above, where the first is better at detecting positive variance changes and the latter better at detecting negative variance changes.

EWMA- \(S^2\) and EWMA-ln( \(S^2\) )

The EWMA-\({S}^{2}\) and EWMA-ln(\({S}^{2}\)) procedures have \({ARL}_{0}\) values of 340.1 and 330.0, respectively. We find a clear effect of the size of the variance change (Fig. 5d) if there is no mean change. In case there is a mean change but no variance change, the procedures are able to detect the mean change. However, although the two procedures are in some cases also sensitive to simultaneous mean and variance changes, there is a complex and at first sight unexpected interaction between the size and direction of the mean and variance changes. Specifically, negative variance changes (i.e., decrease) in combination with a mean change are challenging to detect, leading to some very high \(ARL\) values. To better understand why this is the case, Fig. 7 shows boxplots of \({S}^{2}\) and ln(\({S}^{2}\)) scores (without exponential weighting) for four different settings: 1) no change, 2) \(\mu\) change of 0 and \(\sigma\) change of -80%, 3) \(\mu\) change of .5\(\sigma\) and \(\sigma\) change of -80%, and 4) \(\mu\) change of 1\(\sigma\) and \(\sigma\) change of -80%. If we compare the \({S}^{2}\) scores (Fig. 7a) in settings 1 and 2, we see that the \(\sigma\) change of -80% shifts the distribution down as well as substantially decreases the width of the distribution. There is little overlap between the distributions, leading to a low \(ARL\) value for setting 2. Introducing a mean change on top of the \(\sigma\) change (settings 3 and 4), not only shifts the distributions up but also increases the width of the distribution. Thus, for the same standard deviation change of the original scores, the distribution of the \({S}^{2}\) scores does not just shift up depending on an additional mean change, but also changes in standard deviation. For the mean change of .50\(\sigma\) this leads to sufficient differentiation with regard to setting 1, leading to a low \(ARL\) value, whereas for the mean change of 1\(\sigma\), there is a lot of overlap, leading to a very high \(ARL\) value.

Fig. 7
figure 7

Example boxplots of A \({S}^{2}\) and B ln(\({S}^{2}\)) scores for four settings: 1) \({\mu }_{2}\) = 0 and \({\sigma }_{2}\) = 1, implying no change, 2) \({\mu }_{2}\) = 0 and \({\sigma }_{2}\) = .20, 3) \({\mu }_{2}\) = .50 and \({\sigma }_{2}\) = .20, and 4) \({\mu }_{2}\) = 1 and \({\sigma }_{2}\) = .20

For the ln(\({S}^{2}\)) scores (Fig. 7b), we see a different pattern. If we compare setting 1 with setting 2 (i.e., \(\sigma\) change of -80%), we observe two phenomena. The distribution of the ln(\({S}^{2}\)) scores shifts down whereas the width of the distribution stays the same (which makes sense due to the natural logarithm transformation; Knoth, 2005). There is not much overlap, which leads to a low \(ARL\) value (Fig. 5d). By introducing an additional mean change (settings 3 and 4), the distribution of the ln(\({S}^{2}\)) shifts upwards and the width of the distributions does decrease. In these two cases, there is substantial overlap, leading to high \(ARL\) values (Fig. 5d).

EWMA- \(\overline{X }\) - \(S^2\)

The \({ARL}_{0}\) value of the EWMA-\(\overline{X }\)-\({S}^{2}\) procedure equals 333.0. When there is no mean change, we find a clear effect of variance changes (Fig. 5d), where the \(ARL\) values are lower for the larger changes. The \(ARL\) values are slightly higher than those obtained when using only the EWMA-\({S}^{2}\) procedure, due to the wider control limits to keep the type I error constant (i.e., \({ARL}_{0}\) of 370). As expected, the procedure is also sensitive to mean changes and to simultaneous mean and variance changes. For the \(\sigma\) change of -40% in combination with the mean change of .50\(\sigma\), we again see that the ability to detect these changes is somewhat compromised. The \(ARL\) value for this setting is slightly lower for EWMA-\(\overline{X }\)-\({S}^{2}\) as compared to conducting only EWMA-\({S}^{2}\), as the underlying EWMA procedure benefits from the mean change. Detecting this mean change is still somewhat challenging, however, because of the combination with the negative variance change.

Recommendations

From a clinical perspective, it is desirable to use a procedure that gives consistent as well as expected results: a high \({ARL}_{0}\) value when no change occurred and low \(ARL\) values when a change occurred, irrespective of the direction of change. We see such behavior, for example, for the EWMA procedure applied to the day statistics of variability (Fig. 5b). These procedures show consistent patterns across different settings: high \(ARL\) values in case there is no variance change and low \(ARL\) values in case there is a variance change. The differences in \(ARL\) values of these three methods are relatively small, and one may come to the same conclusions when applying these procedures in practice. The EWMA-\({S}^{2}\) and EWMA-ln(\({S}^{2}\)) procedures and to a lesser extent the EWMA-\(\overline{X }\)-\({S}^{2}\) procedure, however, show unexpected behavior for certain settings. Specifically, decreases in variance can no longer be detected (as well) when there is also a change in mean level. Such unexpected and inconsistent behavior may not be desirable when one does not know what type and/or direction of change can be expected. Therefore, we provide the following recommendations on when to use which procedure.

If one is only interested in detecting mean changes, we recommend applying the EWMA procedure to the day-\(\overline{x }\) (Fig. 5a). However, one should be aware that this method can also pick up on variance changes. When only interested in variance changes, we recommend applying the EWMA procedure to the day statistics of variability (Fig. 5b). Specifically, use EWMA day-\({s}^{2}\) when one expects an increase in variation, use EWMA day-ln(\(s\)) when one expects a decrease in variation, and use EWMA day-\(s\) if not sure about the direction. When interested in detecting both mean and variance changes, we advise to apply the MEWMA procedure to the day-\(\overline{x }\) and a day statistic of variability (Fig. 5c). However, user should keep in mind that more measurement occasions (i.e., days) are required in phase I to attain acceptable \({ARL}_{0}\) values.

Discussion

SPC procedures, and the (M)EWMA procedure in particular, seem promising methods to screen for early warning signals of depression in real-time. Whereas previous research on this topic focused on mean changes (Schat et al., 2021; Smit et al., 2019; Smit & Snippe, 2022; Snippe et al., 2022), we further expanded the SPC toolset by developing and comparing approaches that allow for the detection of variance changes as well. This is important as there is empirical evidence that the amount of variance provides unique information on psychopathology levels (Dejonckheere et al., 2019). We first developed a novel approach, by proposing to select and compute a day statistic of variability (i.e., day-\({s}^{2}\), day-\(s\) or day-ln(\(s\))) and monitor for mean changes in this statistic using EWMA. Building on this idea, we suggested applying the MEWMA procedure to both day averages and one of the three day statistics of variability if one wants to detect both mean and variance changes. We compared these novel approaches to three existing EWMA-type procedures (i.e., EWMA-\({S}^{2}\), EWMA-ln(\({S}^{2}\)), and EWMA-\(\overline{X }\)-\({S}^{2}\)) that have been developed to detect a combination of mean and variance changes in the raw scores. We illustrated the behavior of the different procedures on publicly available ESM data of an individual who relapsed into depression, and further investigated their performance by means of a simulation study. We first discuss the obtained results and provide recommendations. Next, we discuss a number of challenges to be tackled in future research.

Results and recommendations

Analyzing the ESM item ‘restless’ of a patient that relapsed into depression after an anti-depressant dose reduction (Groot, 2010; Wichers et al., 2016) showed that the performance of the (M)EWMA procedures applied to the day statistics and the EWMA-\({S}^{2}\), EWMA-ln(\({S}^{2}\)), and EWMA-\(\overline{X }\)-\({S}^{2}\) procedures applied to the raw scores was very similar, in terms of first out-of-control scores. These first out-of-control scores occurred on day 10 of phase II already, well ahead of the relapse into depression, and thus would have allowed for preventive action. These results were found for both mean changes and variance changes, however, raising the question how well the different approaches can distinguish between mean and variance changes.

This question was investigated further in the simulation study, in which we manipulated the size of the mean change and the size of the variance change. Each generated data set was analyzed with all ten considered procedures (i.e., seven procedures using day statistics and three using raw scores). The simulation results indicate that applying the EWMA and MEWMA procedure to day statistics works well for detecting mean and/or variance changes. EWMA day-\(\overline{x }\) is a good approach to detect mean changes, which is in line with previous research (Schat et al., 2021). Here we gained the additional insight that this procedure also detects increases in variance when there is no mean change, however, which is important information when trying to disentangle both. The reason is that the control limits of this procedure are a function of the in-control \({\widehat{\sigma }}_{1}\) (Hawkins & Deng, 2009). The EWMA procedures applied to the day statistics of variability showed to be insensitive to mean changes, and differ in performance depending on the direction of the variance change. EWMA day-\({s}^{2}\) performed best at detecting increases and EWMA day-ln(\(s\)) performed best at detecting decreases. When one is uncertain about the direction of change, we recommend using the EWMA day-\(s\) procedure, as this procedure performs equally well in both directions. Furthermore, the simulation results for the EWMA-\({S}^{2}\) and EWMA-ln(\({S}^{2}\)) procedures revealed a complex interaction between the direction and type of change. As the \(ARL\) values can be very high, even when inducing clear changes, we do not recommend applying these procedures in practice. Although the EWMA-\(\overline{X }\)-\({S}^{2}\) procedure performs relatively well, it also shows unexpected behavior when a mean change occurs alongside a negative variance change.

Regarding the design of ESM studies, we recommend including at least five assessment moments per day if one is interested in monitoring variance levels using SPC. Even with missing values, there will likely be sufficient completed measurement occasions per day to obtain a statistic of variability. Computing and monitoring day statistics of variability is still possible with three measurement occasions per day, however, it becomes problematic with only two measurement occasions per day and impossible with only one measurement occasion. Although the performance of the EWMA procedure applied to day statistics of variability slightly worsens with only three measurement occasions per day, it is still able to detect variance changes within a reasonable time frame (see https://osf.io/y9ncq/ for simulation results). The impact of a lower number of measurements per day is largest on the EWMA day-\({s}^{2}\) procedure (i.e., noticeable increase in the \({ARL}_{0}\)), thus in cases with limited numbers of measurement occasions per day we recommend using the EWMA day-\(s\) procedure.

Future directions

In the paper of Schat et al. (2021), a number of future directions were mentioned concerning SPC procedures themselves as well as their application in psychopathology research (e.g., missing data, alternatives for the phase I period), which we will not reiterate here. This paper revealed additional points deserving attention. They pertain to empirical applications, choice of the monitored statistic, multivariate data, combining EWMA procedures and the effect of using estimated versus known in-control (i.e., phase I) parameters.

Empirical applications

Empirical studies on larger samples of participants of an antidepressant reduction experiment suggest that the EWMA day-\(\overline{x }\) procedure is useful to predict whether or not an individual will relapse into depression (Smit & Snippe, 2022; Snippe et al., 2022). In future research, it would be interesting to expand on these findings by investigating whether these results can be further improved by screening for variance changes as well. Such improvement can take different forms: First, it may be that we can further increase the accuracy of our predictions about which patients relapse. Second, including variance changes may improve the speed with which we can determine whether someone is at risk of a depressive episode, which would allow for earlier interventions.

Choice of the monitored statistic

A clear advantage of using day statistics is that one can in principle use any statistic of interest and monitor for mean changes in the chosen statistic. When interested in variance changes, one could also use a measure of variability that corrects for the mean. Mestdagh et al. (2018), for example, proposed the relative variability index to disentangle variability and means in the case of bounded measurement scales. Specifically, this measure takes into account the maximum amount of variance one can have, given a particular observed mean and the measurement scale used.

Multivariate data

In the illustrative example as well as the simulation study, we focused on analyzing single variables. However, ESM studies usually include multiple items (e.g., Cloos et al., 2022; Eisele et al., 2020; Snippe et al., 2022), that may vary in how predictive they are of future relapses. Therefore, it may be interesting to monitor for variance (and mean) changes in more than one variable simultaneously. In such cases, we can for instance apply the MEWMA procedure to multiple day statistics of variability (i.e., one for each variable), or we can apply the single variable procedures to (weighted) sum scores of the different items. More research is needed to investigate how these different options perform, and how the presence of noise variables (i.e., variables that do not predict relapse) influences performance.

Non-parametric SPC procedures

We observed that the distribution of the day-\({s}^{2}\) scores, and to a lesser extent the day-\(s\) scores, is positively skewed (Fig. 2). Moreover, the distribution of the day-\(\overline{x }\) scores may also be skewed when the ESM scores themselves are not normally distributed (Schat et al., 2021), which is often the case for negative affect (Heininga et al., 2019). It may thus be interesting to investigate whether we can further improve performance by using SPC procedures that do not impose the assumption of normality implied by EWMA. It particularly makes sense for future research to look into the growing body of research on non-parametric SPC procedures (see e.g., Liang et al., 2022; Mukherjee & Chakraborti, 2012; Qiu, 2018; Zou & Tsung, 2010). While promising, these methods are somewhat more challenging to apply because they are not readily available in R and require further tuning of the method to the problem at hand. For example, non-parametric procedures based on data categorization and categorical data analysis require users to set a number of categories to be used (Qiu, 2018). This calls for recommendations on how to best tweak these parameters for ESM data. Moreover, it should be investigated whether non-parametric procedures truly outperform the EWMA-based methods at detecting changes in ESM data.

Estimated versus known in-control parameters

In practice, researchers will almost never know the parameter values (i.e., \({\mu }_{1}\) and \({\sigma }_{1}\)) that govern the in-control distribution, explaining the need for phase I data in order to obtain estimates of these parameters. However, when using these parameter estimates, the observed \({ARL}_{0}\) value may not be the same as the nominal \({ARL}_{0}\) value if there is insufficient phase I data to obtain reliable estimates. For the EWMA procedure, it has been shown that one only obtains the set \({ARL}_{0}\) value if one uses known parameters or a very large number of phase I observations (> 7000; Saleh et al., 2015) and the raw scores are normally distributed. The \(\lambda\) value has an additional influence on this, where lower \(\lambda\) values lead to lower \({ARL}_{0}\) values when using estimated parameters. In terms of \({ARL}_{1}\) values, the differences are minimal and will in practice lead to out-of-control scores that differ with a few days only.

The realization that the \({ARL}_{0}\) values based on estimated parameters will deviate from the nominal \({ARL}_{0}\) value, raises the question on how to determine and compare the performance of SPC procedures. For example, should one simply look for the method that best approaches the nominal \({ARL}_{0}\) value, or should we look for the method for which the difference in \({ARL}_{0}\) values based on estimated and known parameters is small. Additionally, one can also look for the method for which the difference between the \({ARL}_{0}\) and \({ARL}_{1}\) is as large as possible.

Conclusions

Applying the EWMA and MEWMA procedures to day statistics is promising for detecting mean and variance changes in psychopathology research. We provide recommendations on which day statistic(s) to use depending on the type and direction of the change(s) one expects to see in the data.