Sample size estimation for heterogeneous growth curve models with attrition

Vallejo, Guillermo; Ato, Manuel; Fernández, M. Paula; Livacic-Rojas, Pablo E.

doi:10.3758/s13428-018-1059-y

Sample size estimation for heterogeneous growth curve models with attrition

Published: 22 June 2018

Volume 51, pages 1216–1243, (2019)
Cite this article

Download PDF

Behavior Research Methods Aims and scope Submit manuscript

Sample size estimation for heterogeneous growth curve models with attrition

Download PDF

Guillermo Vallejo¹,
Manuel Ato²,
M. Paula Fernández¹ &
…
Pablo E. Livacic-Rojas³

6498 Accesses
17 Citations
8 Altmetric
1 Mention
Explore all metrics

Abstract

In this study, two approaches were employed to calculate how large the sample size needs to be in order to achieve a desired statistical power to detect a significant group-by-time interaction in longitudinal intervention studies—a power analysis method, based on derived formulas using ordinary least squares estimates, and an empirical method, based on restricted maximum likelihood estimates. The performance of both procedures was examined under four different scenarios: (a) complete data with homogeneous variances, (b) incomplete data with homogeneous variances, (c) complete data with heterogeneous variances, and (d) incomplete data with heterogeneous variances. Several interesting findings emerged from this research. First, in the presence of heterogeneity, larger sample sizes are required in order to attain a desired nominal power. The second interesting finding is that, when there is attrition, the sample size requirements can be quite large. However, when attrition is anticipated, derived formulas enable the power to be calculated on the basis of the final number of subjects that are expected to complete the study. The third major finding is that the direct mathematical formulas allow the user to rigorously determine the sample size required to achieve a specified power level. Therefore, when data can be assumed to be missing at random, the solution presented can be adopted, given that Monte Carlo studies have indicated that it is very satisfactory. We illustrate the proposed method using real data from two previously published datasets.

Search for efficient complete and planned missing data designs for analysis of change

Article 14 July 2015

What to Do When Only a Baseline Measurement Is Available

Single time point comparisons in longitudinal randomized controlled trials: power and bias in the presence of missing data

Article Open access 12 April 2016

Longitudinal studies are increasingly common in educational and psychological research settings. In some cases, subjects are measured repeatedly over time in order to examine their individual growth and the potential differences among them. In other cases, subjects assigned to different experimental conditions are treated for a specific period of time and when the study is finished they are compared with respect to their average growth rates. Whatever the purpose of the study, it is usual and reasonable to model the change in the response of interest assuming linear growth (Willett, 1988) and to express the effect of the intervention in terms of the difference in mean slopes or rates of change among groups over time.

A wide variety of methods based on classical linear models can be applied to the analysis of longitudinal data. However, the presence of imbalance, due to missing responses from some subjects or due to observations from the same subject being generally correlated, can lead to erroneous conclusions regarding hypotheses of interest. Among other reasons, this is why multilevel hierarchical linear models have become the method of choice for modeling the change in response over time and the factors influencing the change.

Modeling longitudinal data using a hierarchical system of regression equations requires sufficient experimental units in order to detect the effects of interest at the desired power level. Hence, it is advisable to determine the sample size when planning a longitudinal study. Numerous publications have explained how to calculate the sample size in this type of study (e.g., Heo, Xue, & Kim, 2013; Muthén & Curran, 1997; Raudenbush & Liu, 2001; Usami, 2014; Wänström, 2009). There are also many software packages (e.g., ACluster, nQuery, OptimalDesign, PASS, PinT, or RMASS2) that can be used to perform sample size/power calculations with multilevel data. However, very few publications have dealt with informing researchers on this topic about errors due to heterogeneous variances across treatment groups and/or when it is expected that some subjects will leave the study prematurely (Hedeker, Gibbons, & Waternaux, 1999; Heo, 2014; Roy, Bhaumik, Aryal & Gibbons, 2007; Vallejo, Ato, Fernández, Livacic-Rojas, & Tuero-Herrero, 2016).

Loss of subjects invariably occurs in longitudinal studies, potentially leading to inefficient analyses and invalid conclusions. The existence of heterogeneity has been found in several reviews of studies published in psychology journals (cf. Erceg-Hurn & Mirosevich, 2008). This phenomenon is not only likely to occur in nonrandomized intervention studies, but it can also occur in completely randomized experiments. Some common causes of heterogeneity in real data are problems related to measurement validity, research design, and analysis (e.g., unclear randomization, high dropout rates, small sample sizes, presence of floor or ceiling effects in treatment outcome measures, differential treatment effects across subjects, or bad data). Regardless of the potential sources of heterogeneity, neglecting heterogeneity when it is present can lead to inefficient and potentially misleading inferences about fixed effects. For more detailed information about why heterogeneity occurs in intervention studies, see Grissom and Kim (2012). Also, Keselman, Algina, Lix, Wilcox, and Deering (2008) discuss the impact that heterogeneous variances have on error probabilities.

For the derivation of the power function, it is generally assumed that all variance components included in the multilevel models are known. When suitable prior information is not available, specification of these random components is sometimes a difficult task. In these cases, a possible solution is to simplify the procedure of power analysis by assuming that some effects vary randomly between subjects or clusters, whereas others are constrained to be fixed effects (e.g., a model with nonrandomly varying slopes). These restrictions are sometimes specified in applied research (e.g., Heo & Leon, 2008, 2009). When a source of variation is completely ignored, however, this can lead to overly optimistic sample size and power calculations. For instance, if random-intercept models are used inappropriately, given that both random-intercept and -slope models need to be considered, there is a considerable risk of finding high apparent power, because the so-called random-intercept model generally has a poor control of the Type I error rate (Vallejo, Ato, & Valdés, 2008).

Usami (2014) has developed a procedure that can be applied in order to examine the statistical power to detect a significant group-by-time interaction in a two-level random-coefficient regression model, especially when no informative variance components are available. However, this author confined the development of the proposed method for investigating sample size requirement to detecting an intervention effect based on two groups for situations that assume a linear growth pattern of the outcomes over time, complete data for every subject, and homogeneous errors at both Levels 1 and 2. Subsequently, Vallejo et al. (2016) extended the procedure proposed by Usami (2014) to situations in which the presence of between-subjects heterogeneity can be reasonably predicted and the influence of attrition taken into consideration. However, the formulas derived by Vallejo et al. (2016) are restricted to models that assume a linear change in responses over time. Furthermore, the adequacy of the sample size determination formulas for heterogeneous and incomplete data has not been investigated.

The present study extended the work of Vallejo et al. (2016) so as to overcome the aforementioned limitations and, therefore, can be viewed as a generalization of the corresponding results of these authors. Specifically, our objective in this article is threefold: first, to extend the method originally proposed by Usami (2014) and later updated by Vallejo et al. (2016) to more complex growth models for power and sample size determinations; second, to carry out a Monte Carlo study to verify the statistical power achieved with the estimated sample sizes; and third, to check whether the theoretical statistical power based on estimates by ordinary least squares (OLS) differs from the empirical statistical power based on maximum likelihood (ML) estimates, by means of Monte Carlo simulations. In this study, we used restricted ML (REML) as the estimation method because, in multilevel modeling, REML estimates of variance components tend to be less biased than unrestricted ML estimates (Browne & Draper, 2000).

Formulation of a statistical model

Suppose we are interested in comparing the longitudinal trends of two groups, experimental versus control, in a numeric dependent variable. Considering that measures taken over time are nested in subjects, such data can be analyzed using a hierarchical regression model with two levels. At the first level, we represent the change we expect each subject of the population to experience during a specific period of time, whereas at the second level we describe the conjectured relationship between the parameters of individual growth and the explanatory variables that are assumed stable for the whole duration of the study.

Adopting an individual growth model in which change is a linear function of time, the Level 1 model can be formulated as follows:

$$ {Y}_{it}={b}_{0i}+{b}_{1i}{X}_{it}+{e}_{it}, $$

(1)

where Y_it denotes the response variable of the ith subject (i = 1, . . . , N) at the tth measurement occasion (t = 1, . . . , T), X_it defines the specific time (e.g., days) that this subject is observed, and random parameters b_0i (intercept), b_1i (slope or rate of change) and e_it (error term), respectively represent the true value of the subject’s response at baseline, the rate of change during the period of data collection and the measurement error caused by the deviation from linearity. In the absence of missing data, we assume that X_it = X_t for all i, and that measurements of the response from the baseline (X₁ = 0) to the last time point increase at time intervals whose length is equal to unity; so, D = T − 1. It is important to observe that starting the time coding with T₁ = 1 instead of T₁ = 0 would be equivalent, but more difficult to interpret, because the value zero is outside the range of observed measurement occasions.

At the second level, the parameters resulting from modeling the trajectories of individual change over time, are related to the explanatory variables that describe the differences between subjects in intercepts and slopes. If we have only one explanatory variable (e.g., a behavioral intervention to improve the language of autistic children), the Level 2 model becomes

$$ {b}_{0i}={\beta}_{00}+{u}_{0i}, $$

(2)

$$ {b}_{1i}={\upbeta}_{10}+{\upbeta}_{11}{W}_i+{u}_{1i}, $$

(3)

where the indicator variable of the intervention program is W_i = 0 if the ith Level 2 unit is assigned to the control group, and W_i = 1 if it is assigned to the experimental group. Because of the randomization of subjects to the two treatment groups, the Level 2 model for the intercept does not contain the value of group-level variable W_i and we assume a common mean response at time t = 0. In this model, β₀₀ is the mean response in treatment and control group at baseline because no treatment main effect is assumed, β₁₀ is the average rate of change of the control group and β₁₁ is the difference between the average rates of change for the groups. As a result, the average rate of change of the experimental group corresponds to the sum of β₁₀ + β₁₁ Random variables u_0i and u_1i are independent from e_it and it is assumed that they follow a bivariate normal distribution with mean zero, variances τ₀₀ and τ₁₁, respectively, and covariance τ₀₁.

Note that Eq. 2 specifies no predictors for b_0i. Suppose, however, that this intercept depends on W_i. One might then formulate another form of the random-intercept model. Specifically, b_0i = β₀₀ + β₀₁W_i + u_0i, where β₀₁ is the main effect of the treatment W on b_0i. In this case, residual variance components τ₀₀ and τ₁₁, represent the variability that remains in parameters b_0i and b_1i after controlling the effect due to the program.

By substituting Eqs. 2 and 3 into Eq. 1, the mixed or combined model can be expressed as follows:

$$ {Y}_{it}={\beta}_{00}+{\beta}_{10}{X}_{it}+{\beta}_{11}{W}_i{X}_{it}+\left({u}_{1i}{X}_{it}+{u}_{0i}+{e}_{it}\right). $$

(4)

With no assumptions about group differences at baseline, Eq. 4 should also include W_i as a predictor. It is often assumed that errors e_it, conditional on u_1i and u_0i, are distributed normally and independently with mean zero and constant variance σ². In this study we also considered the presence of heterogeneous variance across treatment groups, although we hold that the distribution of errors is normal.

Under the combined model of Eq. 4, the expected value, variance, and covariance of the measurements Y_it, conditional on the explanatory variables, are given by

$$ E\left({Y}_{it}\right)={\beta}_{00}+\left({\beta}_{10}+{\beta}_{11}{W}_i\right){X}_{it}, $$

(5)

$$ Var\left({Y}_{it}\right)={\tau}_{00}+2{X}_{it}{\tau}_{01}+{X}_{it}^2{\tau}_{11}+{\sigma}^2, $$

(6)

$$ Cov\left({Y}_{it},{Y}_{i{t}^{\prime }}\right)={\tau}_{00}+\left({\mathrm{X}}_{it}+{\mathrm{X}}_{i{t}^{\prime }}\right){\tau}_{01}+{\mathrm{X}}_{it}{\mathrm{X}}_{i{t}^{\prime }}{\tau}_{11}. $$

(7)

If baseline values differ across groups, then Eq. 5 should also include the term β₀₁W_i (For more details on these equations, see Appendix 1.)

If there are reasons to suspect that changes in the expected value of the outcome will deviate from linearity over the duration of the study, more complex models of growth can be considered. For example, if the average outcome increases monotonically with time until the improvement stabilizes, then we might consider the following curvilinear growth model:

$$ {\displaystyle \begin{array}{l}{Y}_{it}={\beta}_{00}+\left({\beta}_{10}+{\beta}_{11}{W}_i\right){X}_{it}+\left({\beta}_{20}+{\beta}_{21}{W}_i\right){X}_{it}^2+\\ {}\kern.5em \left({u}_{0i}+{u}_{1i}{X}_{it}+{u}_{2i}{X}_{it}^2+{e}_{it}\right).\end{array}} $$

(8)

Again, we can accept the groups as equivalent enough at the beginning of the study and omit a main effect of treatment from the model. To allow the intercepts (baselines) to differ by groups, we add the dummy variable treatment W to the model of Eq. 8.

In the model of Eq. 8, the expected value, variance, and covariance of the measurements Y_it, conditional on the explanatory variables, are now given by

$$ E\left({Y}_{it}\right)={\beta}_{00}+\left({\beta}_{10}+{\beta}_{11}{W}_i\right){X}_{it}+\left({\beta}_{20}+{\beta}_{21}{W}_i\right){X}_{it}^2, $$

(9)

$$ Var\left({Y}_{it}\right)={\tau}_{00}+2{X}_{it}{\tau}_{01}+{X}_{it}^2{\tau}_{11}+2{X}_{it}^2{\tau}_{02}+2{X}_{it}^3{\tau}_{12}+{X}_{it}^4{\tau}_{22}+{\sigma}^2, $$

(10)

$$ {\displaystyle \begin{array}{l} Cov\left({Y}_{it},{Y}_{i{t}^{\prime }}\right)={\tau}_{00}+\left({X}_{it}+{X}_{i{t}^{\prime }}\right){\tau}_{01}+{X}_{it}{X}_{i{t}^{\prime }}{\tau}_{11}+\\ {}\kern10em \left({X}_{it}^2+{X}_{i{t}^{\prime}}^2\right){\tau}_{02}+\left({X}_{it}{X}_{i{t}^{\prime}}^2+{X}_{it}^2{X}_{i{t}^{\prime }}\right){\tau}_{12}+{X}_{it}^2{X}_{i{t}^{\prime}}^2{\tau}_{22}.\end{array}} $$

(11)

Equation 9 should also include the term β₀₁W_i when the baseline mean responses are not assumed equal.

To simplify the calculations further, it is useful to re-express Eqs. 4 and 8 of the multilevel model in terms of matrices and vectors, as follows:

$$ {\mathbf{y}}_i={\mathbf{X}}_i\beta +{\mathbf{Z}}_i{\mathbf{u}}_i+{\mathbf{e}}_i, $$

(12)

where y_i is a T × 1 vector of repeated observations for the ith subject, X_i(=Z_iA_i) is a (T × P) design matrix for the fixed effects, β is a vector (P × 1) of fixed effects, Z_i is a (T × Q) design matrix for the random effects, u_i is a (Q × 1) vector of random effects, and e_i is a (T × 1) vector of errors. Here, Z_iis a within-subjects design cual’s mean response changes over time, and A_i is a (Q × P) between-subjects design matrix that contains time-invariant explanatory variables.

With respect to errors and random effects, it is assumed that vectors e_i and u_i are normally distributed with mean 0 and variance and covariance matrices R_i and T, respectively. Matrix R_i may take various forms, however, it is common to assume a model of conditional independence, that is, R_i = σ²I_T, where I is a T × T identity matrix. These assumptions imply that, marginally, $ {\mathbf{y}}_i\sim N\left({\mathbf{X}}_i\upbeta, {\mathbf{V}}_i={\mathbf{Z}}_i{\mathbf{TZ}}_i^{\prime }{\mathbf{R}}_i\right) $. When V_i is known, the generalized least squares estimator of vector β is given by $ \hat{\upbeta}={\left({\sum}_{i=1}^N{\mathbf{X}}_i^{\prime }{\mathbf{V}}_i^{-1}{\mathbf{X}}_i\right)}^{-1}{\sum}_{i=1}^N{\mathbf{X}}_i^{\prime }{\mathbf{V}}_i^{-1}{\mathbf{y}}_i $ and its variance by $ {\left({\sum}_{i=1}^N{\mathbf{X}}_i^{\prime }{\mathbf{V}}_i^{-1}{\mathbf{X}}_i\right)}^{-1} $. In the usual case where V_i is unknown, then an approximation to the true covariance is given, replacing V_i with its estimator $ \hat{{\mathbf{V}}_i} $.

Equations 5–7 and 9–11 are essential in order to plan a longitudinal study properly since, as we shall see later, they provide the machinery that allows us to carry out a correct power analysis. To estimate the sample size required to detect a statistically significant group-by-time interaction effect, it is necessary to specify the value of the parameters included in Eqs. 1–3 of the model. However, such a task is neither easy nor straightforward, given that in many cases it is impossible to surmise the value of the parameters without running the experiment. Hence, in practice, the use of existing methods for calculating the sample size is limited to situations in which researchers are able to anticipate a range of probable values of the parameters of interest from the results obtained in previous studies.

In an attempt to optimize focus for a power analysis in studies in which linear growth is assumed, Usami (2014) suggests transforming the variance components associated with the model of Eq. 4 and the parameter related to the treatment (i. e., β₁₁) into statistical indices whose possible values could reasonably be specified in advance. These are reliability of measure at the baseline (ρ₁), standardized effect size at the last time point (d_L), level 2 residuals correlation (r₁) and ratio between the variance of outcomes at the end and at the beginning of study within groups (k₁). Formally,

$$ {\uprho}_1=\frac{Var\left({u}_{0i}\right)}{Var\left({u}_{0i}+{e}_{it}\right)}=\frac{\tau_{00}}{\tau_{00}+{\sigma}^2}, $$

(13)

$$ {d}_L=\frac{E\left({Y}_{iT}\left|{W}_i=1\right.\right)-E\left({Y}_{iT}\left|{W}_i=0\right.\right)}{\sqrt{Var\left({Y}_{iT}\right)}}=\frac{D{\upbeta}_{11}}{\sqrt{\tau_{00}+2D{\tau}_{01}+{D}^2{\tau}_{11}+{\sigma}^2}}, $$

(14)

$$ {r}_1=\frac{Cov\left({u}_{0i},{u}_{1i}\right)}{\sqrt{Var\left({u}_{0i}{u}_{1i}\right)}}=\frac{\tau_{01}}{\sqrt{\tau_{00}{\tau}_{11}}}, $$

(15)

and

$$ {k}_1=\frac{Var\left({Y}_{iT}\right)}{Var\left({Y}_{i1}\right)}=\frac{\tau_{00}+2D{\tau}_{01}+{D}^2{\tau}_{11}+{\sigma}^2}{\tau_{00}+{\sigma}^2}. $$

(16)

It is important to note that the effect size parameter of Eq. 14 depends on the sum β₀₁ + Dβ₁₁, rather than on the choice of β₁₁ alone, when β₀₁ ≠ 0.

By solving Eqs. 15 and 16 simultaneously, the following components of variance and covariance are obtained (see Appendix 2):

$$ {\tau}_{01}=\frac{-{r}_1^2{\tau}_{00}+{r}_1\sqrt{r_1^2{\tau}_{00}^2+{\tau}_{00}\left({k}_1-1\right)\left({\tau}_{00}+{\sigma}^2\right)}}{D}, $$

(17)

$$ {\tau}_{11}=\frac{2{r}_1^2{\tau}_{00}+\left({k}_1-1\right)\left({\tau}_{00}+{\sigma}^2\right)-2{r}_1\sqrt{r_1^2{\tau}_{00}^2+{\tau}_{00}\left({k}_1-1\right)\left({\tau}_{00}+{\sigma}^2\right)}}{D^2}. $$

(18)

At the same time, by replacing Var(Y_iT) in Eq. 14 with the value found for it in Eq. 16, the coefficient associated with the effect of linear treatment can be written as:

$$ {\upbeta}_{11}=\frac{d_L\sqrt{k_1\left({\tau}_{00}+{\sigma}^2\right)}}{D}. $$

(19)

Please note that if β₀₁ ≠ 0, then $ {\upbeta}_{11}=\left(-{\beta}_{01}+{d}_L\sqrt{k_1\left({\tau}_{00}+{\sigma}^2\right)}\right)/D\operatorname{}. $

Without loss of generality, we can assume that the variance of the initial outcome is equal to 1 (i. e., τ₀₀ + σ² = 1). In this case, Eqs. 13–19 reduce to that given by Usami (2014). The restriction above makes it possible to calculate the parameters of the model by specifying the values of ρ₁, d_L, r₁, and k₁. However, it should be noted that in this regard these indices can be detailed intuitively, which largely prevents the difficulty involved in exploratory studies in defining the values of the parameters before running the experiment. In addition, Usami found that the indices ρ₁, r₁, and k₁ have less influence on the sample size calculation than does d_L, in particular when d_L> 0.4.

So far, we have focused on a series of formulas derived in order to run a prospective analysis of power in models that assume linear growth. However, this approach can be extended to more complex curvilinear growth models, including polynomial and piecewise growth models. For instance, the outcome may follow a quadratic trend that would require the inclusion of the second-order treatment effect in the model (see Eq. 8).

The calculation of an appropriate sample size for detecting curvature in growth rates relies on transformation of the model parameters (i. e., τ₀₂, τ₁₂, τ₂₂, and β₂₁) into indices that can be specified from a literature review and conjecture. In addition to those specified in Eqs. 13–16, this new situation requires the inclusion of four additional indices. Using the results of Eqs. 9–11, these are defined as follows:

$$ {\displaystyle \begin{array}{c}{d}_Q=\frac{E\left({Y}_{iT}|{W}_i=1\operatorname{}\right)-E\left({Y}_{iT}\operatorname{}{W}_i=0|\right)}{\sqrt{Var\left({Y}_{iT}\right)}}\\ {}=\frac{D{\beta}_{11}+{D}^2{\beta}_{21}}{\sqrt{\tau_{00}+2D{\tau}_{01}+{D}^2{\tau}_{11}+2{D}^2{\tau}_{02}+2{D}^3{\tau}_{12}+{D}^4{\tau}_{22}+{\sigma}^2}}\end{array}}, $$

(20)

$$ {r}_2=\frac{Cov\left({u}_{0i},{u}_{2i}\right)}{\sqrt{Var\left({u}_{0i}{u}_{2i}\right)}}=\frac{\tau_{02}}{\sqrt{\tau_{00}{\tau}_{22}}}, $$

(21)

$$ {r}_{12}=\frac{Cov\left({u}_{1i},{u}_{2i}\right)}{\sqrt{Var\left({u}_{1i}{u}_{2i}\right)}}=\frac{\tau_{12}}{\sqrt{\tau_{11}{\tau}_{22}}}, $$

(22)

and

$$ {k}_2=\frac{Var\left({Y}_{iT}\right)}{Var\left({Y}_{i1}\right)}=\frac{\tau_{00}+2D{\tau}_{01}+{D}^2{\tau}_{11}+2{D}^2{\tau}_{02}+2{D}^3{\tau}_{12}+{D}^4{\tau}_{22}+{\sigma}^2}{\tau_{00}+{\sigma}^2}. $$

(23)

Again, it is important to note that the effect size parameter of Eq. 20 depends on the sum β₀₁ + Dβ₁₁ + D²β₂₁, rather than on the sum Dβ₁₁ + D²β₂₁, when β₀₁ ≠ 0.

By solving the Equation System 21–23, a series of equations of the form ax² + bx + c = 0 are obtained (see Appendix 2). The solutions or roots, which correspond to the components of variance we sought, can be obtained by solving each quadratic equation using the familiar formula of Bhaskara (cf. Puttaswamy, 2012):

$$ {\tau}_{02}=\frac{-{\beta}_{02}\pm \sqrt{B_{02}^2-4{A}_{02}{C}_{02}}}{2{A}_{02}}, $$

(24)

where

$$ {\displaystyle \begin{array}{l}{A}_{02}={D}^4;{B}_{02}=2{D}^2{r}_2^2{\tau}_{00}+2{D}^3{r}_{12}\sqrt{\left({\tau}_{11}/{\tau}_{00}\right)}{r}_2{\tau}_{00};{C}_{02}=2D{\tau}_{01}{r}_2^2{\tau}_{00}+{D}^2{\tau}_{11}{r}_2^2{\tau}_{00-}\\ {}\left({k}_2-1\right)\left({\tau}_{00}+{\sigma}^2\right){r}_2^2{\tau}_{00};\\ {}{\tau}_{12}=\frac{-{B}_{12}\pm \sqrt{B_{12}^2-4{A}_{12}{C}_{12}}}{2{A}_{12}},\end{array}} $$

(25)

where

$$ {\displaystyle \begin{array}{l}{A}_{12}={D}^4;{B}_{12}=2{D}^2{r}_2\sqrt{\left({\tau}_{00}/{\tau}_{11}\right)}{r}_{12}{\tau}_{11}+2{D}^3{r}_{12}^2{\tau}_{11};{C}_{12}=2D{\tau}_{01}{r}_{12}^2{\tau}_{11}+{D}^2{\tau}_{11}{r}_{12}^2{\tau}_{11}-\\ {}\left({k}_2-1\right)\left({\tau}_{00}+{\sigma}^2\right){r}_{12}^2{\tau}_{11};\mathrm{and}\\ {}{\tau}_{22}=\frac{-{\beta}_{22}\pm \sqrt{B_{22}^2-4{A}_{22}{C}_{22}}}{2{A}_{22}},\end{array}} $$

(26)

where

$$ {\displaystyle \begin{array}{l}{A}_{22}={D}^8;{B}_{22}=-{\left(2{D}^2{r}_2\sqrt{\tau_{00}}+{D}^3{r}_{12}\sqrt{\tau_{11}}\right)}^2+2{D}^6{\tau}_{11}+4{D}^5{\tau}_{01}-2{D}^4\left({k}_2-1\right)\left({\tau}_{00}+{\sigma}^2\right);\\ {}{C}_{22}=4{D}^2{\tau}_{01}^2+{D}^4{\tau}_{11}^2+{\left({k}_2-1\right)}^2{\left({\tau}_{00}+{\sigma}^2\right)}^2+4{D}^3{\tau}_{01}{\tau}_{11}-2\left({k}_2-1\right)\left({\tau}_{00}+{\sigma}^2\right)\left(2D{\tau}_{01}+{D}^2{\tau}_{11}\right).\end{array}} $$

Finally, by substituting in Eq. 20 the value found for Var(Y_iT) in Eq. 23, the coefficient for the quadratic treatment effect can be written as

$$ {\upbeta}_{21}=\frac{d_Q\sqrt{k_2\left({\tau}_{00}+{\sigma}^2\right)}-{d}_L\sqrt{k_1\left({\tau}_{00}+{\sigma}^2\right)}}{D^2}. $$

(27)

In the presence of a main effect of the treatment W, the slope formula would have the same form as that provided in Eq. 27, because both d_L and d_Q contain information about β₀₁.

In Appendix 3 the machinery is provided that allows us to carry out a correct power analysis using piecewise models. Because the data from many longitudinal studies can be well-approximated using simple piecewise linear models with at most one or two knots that are located at judiciously chosen time points (Fitzmaurice, Laird, & Ware, 2011, p. 151), we only present a random two-slope piecewise model in which the entire growth period of the outcome under study is split into two parts: (1) linear growth from the baseline to the last time point in the study, and (2) linear growth from the breakpoint to the last time point. Obviously, when determining the sample size, it must be known ahead of time where the breakpoint is.

Estimation of the treatment effect and its variance

The goal of a longitudinal intervention study is to test whether there are differences between treatment conditions with respect to their average growth rates. If the change is conceptualized as a sustained linear process, then we must verify if iom

β₁₁ ≠ 0. With two groups (e.g., experimental versus control), the OLS estimator of $ {\upbeta}_{11} $ can be expressed as:

$$ {\hat{\upbeta}}_{11}=\frac{\sum \limits_{i=1}^{N_E}\sum \limits_{t=1}^T\left({X}_{it}-\overline{X_i}\right){Y}_{it}}{\sum \limits_{i=1}^{N_E}\sum \limits_{t=1}^T{\left({X}_{it}-\overline{X_i}\right)}^2}-\frac{\sum \limits_{i=1}^{N_C}\sum \limits_{t=1}^T\left({X}_{it}-\overline{X_i}\right){Y}_{it}}{\sum \limits_{i=1}^{N_C}\sum \limits_{t=1}^T{\left({X}_{it}-\overline{X_i}\right)}^2}, $$

(28)

where N_E and N_C are the treatment and control group sample sizes, respectively. The generalization of Eq. 28 to more than one active treatment is not direct, but it is simple to derive (see Appendix 4).

To test the interaction effect between variables of Level 1, time, and Level 2, treatment, calculation of the variance of the β₁₁ estimator is required. Using Eqs. 6 and 7, and considering that the variance of a difference reduces to the sum of variances of independent groups, ordinary algebra shows that (see Appendix 5):

$$ Var\left({\hat{\upbeta}}_{11}\right)=\frac{4}{N}\left(\frac{\sigma^2}{\sum_{i=1}^T{\left({X}_{it}-{\overline{X}}_i\right)}^2}+{\tau}_{11}\right), $$

(29)

where N (= N_E + N_C) denotes the total number of units of second level included in study, with N/2 subjects in each group. The quantity 4/N on the right side of Eq. 30 should be replaced with (1/Np₁p₂) to allow groups of unequal size, where p₁ = N_C /N and p₂ = N_E /N.

If the T measures between X₁ = 0 and X_T = D are equally spaced, Eq. 29 can be reformulated as follows (see Fitzmaurice et al., 2011):

$$ Var\left({\hat{\upbeta}}_{11}\right)=\frac{4}{N}\left(\frac{12{\sigma}^2\left(T-1\right)}{D^2T\left(T+1\right)}+{\tau}_{11}\right), $$

(30)

where D = f⁻¹(T − 1) and f is the frequency of observation per time unit, whereas V₁ and τ₁₁ denote the variability in growth rates within and across subjects, respectively. The sum of $ {V}_1+{\tau}_{11},{\sigma}_{b1}^2 $ onward is a measure of variability in the estimation by OLS of the model slope (1).

When growth is assumed to be linear and f = 1(i. e., X_t = 0, 1, 2…, T − 1; T = D + 1), the sample variance of the rate of change simplifies to V₁ = 12σ²/(T³ − T). For more complex growth functions (e.g., quadratic function) and f ≠ 1(e. g., X_t = 0, 2, 4…2T − 2; T = fD + 1), Raudenbush and Liu (2001) showed that the sample variance of the polynomial slope takes the following form

$$ {V}_p=\frac{\sigma^2{f}^{2p}\left(T-p-1\right)!}{l_p\left(T+p\right)!}, $$

(31)

where p denotes the polynomial order of the change of outcome and l_p is a constant whose values depend on the way of coding the time variable (sequential, centered or orthogonal). To model nonlinear relations across time it is beneficial to use orthogonal polynomials, since this reduces any form of collinearity that can result from using multiples of t as regressors.

Alternatively, the variance of any trend of interest (e.g., linear, quadratic, or cubic), regardless of the form assumed to characterize the covariance structure of measurement error, can be more easily obtained from the appropriate diagonal element of

$$ Cov\left({\hat{\mathbf{b}}}_i\right)={\left({\mathbf{Z}}_i^{\prime }{\mathbf{V}}_i^{-1}{\mathbf{Z}}_i\right)}^{-1}, $$

(32)

where Z_i is a design matrix that specifies the change of outcome of any subject across the study (i.e., a constant, linear, quadratic, etc., function), $ {\mathbf{V}}_i\left(={\mathbf{Z}}_i{\mathbf{T}}_i{\mathbf{Z}}_i^{\prime }+{\mathbf{R}}_i\right) $ is the covariance matrix of repeated measurements, T_i is the dispersion matrix of Level 2 random effects, and R_i is the covariance structure of Level 1 errors.

Additionally, a quick and easy way to test the effects that D and f will have on the power using the matrix formulation of the model is to divide the linear trend component of matrix Z_i by f, that of the quadratic trend component by f² that of the cubic trend by f² and so on. Very often f = 1, but depending on the value of D, there are many possible alternative results (e.g., f = 0.5 or f = 2).

Statistical power analysis

The power to detect a specified treatment difference is defined as the probability of rejecting the null hypothesis of no treatment-by-linear-trend interaction H₀ : β₁₁ = 0, given that it is in fact false (β₁₁ ≠ 0). Using Eqs. 28 and 30, this hypothesis can be tested with:

$$ {F}_0=\frac{{\hat{\upbeta}}_{11}^2}{Var\left({\hat{\upbeta}}_{11}\right)}, $$

(33)

where $ Var\left({\hat{\upbeta}}_{11}\right)={\sigma}_{b1}^2/{Np}_1{p}_2, $p₁ = N_C /N, p₂ = N_E/N, and N = N_C + N_E. The F₀ statistic follows the central F distribution when H₀ is true, but when H₀ is false it follows the noncentral F distribution with df₁ degrees of freedom in the numerator, df₂ degrees of freedom in the denominator, and noncentrality parameter λ which is defined as

$$ \lambda =\frac{Np_1{p}_2{\beta}_{11}^2}{\sigma_{b1}^2}, $$

(34)

This strategy is both feasible and straightforward for studies in which there is good reason to assume that the groups have equal variances. However, as we previously indicated, it is possible that the assumption of Level 1 and/or Level 2 homogeneity of variances will be violated (see the example described in Vallejo, Fernández, Cuesta, & Livacic-Rojas, 2015, for details). Under the most general scenario, the noncentrality parameter λ is given by

$$ {\lambda}^{\cdot }=\frac{N{p}_1{p}_2{\beta}_{11}^2}{\sigma_{b{1}^{(C)}}^2+{\sigma}_{b{1}^{(E)}}^2}, $$

(35)

where $ {\upsigma}_{b{1}^{(C)}}^2={p}_2\left[12{\upsigma}_{(C)}^2/\left({T}^3-T\right)+{\uptau}_{11}^{(C)}\right] $ and $ {\upsigma}_{b{1}^{(E)}}^2={p}_1\left[12{\upsigma}_{(E)}^2/\left({T}^3-T\right)+{\uptau}_{11}^{(E)}\right]. $

Regardless of the values of f and D and of the number of groups to be compared, as well as in the possible presence of heterogeneity, λ can also be computed using a method similar to the one that Shieh (2003) suggested under the multivariate general linear model. Specifically,

$$ \lambda = tr\left[{\left({\mathbf{AVA}}^{\prime}\right)}^{-1}{\left({\mathbf{C}\mathbf{BA}}^{\prime}\right)}^{\prime }{\left({\mathbf{C}\mathbf{M}}^{-1}{\mathbf{C}}^{\prime}\right)}^{-1}\left({\mathbf{C}\mathbf{BA}}^{\prime}\right)\right], $$

(36)

where tr denotes the trace of matrix [⋅], A = (1_NG| − 1_NG) and C = (1_NG − 1| − 1_NG − 1) are contrast matrices between subjects with a complete row range, 1_NG is a column vector of ones, 1_NG is an identity matrix, and the symbol | represents the augmented matrix resulting from appending the columns of matrices A and C. The expected values matrix across T measurements, B = [μ_(C)0…μ_{(C)T − 1}; μ_(E)0…μ_{(E)T − 1}], can be easily obtained from Eq. 5 by fixing β₀₀ = β₁₀ = 0, M is a diagonal matrix whose elements are the number of subjects in each group [in our case, M = diag(N_C, N_E),], and the V matrix is constructed using Eqs. 6 and 7. If the group variance components are heterogeneous, then V = p₂V_(C) + p₁V_{_(E)}. The described method to compute λ is limited to Model 1; however, nothing prevents this from being extended to other contexts. For example, under Model 8, one would proceed in a similar way, but using Eqs. 9–11.

That said, the procedure used here to calculate the power of the statistical test F₀ to compare groups in terms of linear rates of change involves the following steps:

1.
Define the significance level α and sample sizes of the control and experimental groups—that is, N_C and N_E. Without loss of generality, we can establish that β₀₀ = β₁₀ = 0 (or, alternatively, β₀₀ = β₁₀ = β₂₀ = 0, in the case of the quadratic growth model).
2.
Set the values of indices ρ₁, d_L, r₁ and k₁ (or, alternatively, ρ₁, d_L, d_Q, r₁, r₂, r₁₂, k₁, and k₂, in the case of the quadratic growth model), determining the values of parameters σ², τ₀₀, τ₀₁, τ₁₁ and β₁₁ (or σ², τ₀₀, τ₀₁, τ₁₁τ₀₂, τ₁₂, τ₂₂, β₁₁, and β₂₁, in the case of the quadratic growth model), and calculate the λ parameter defined in Eqs. 34–36.
3.
Specify the critical value of the inverse of the F central distribution function, namely:
$$ {F}_c=\mathrm{FINV}\left(1-\alpha, {df}_1,{df}_2\right). $$
4.
Calculate the probability that the F₀ ratio exceeds the critical value F_C when H₀ is false. Under the alternative hypothesis (H₁), the power function associated with the F₀ test is given by 1 − β = P[F^′(df₁, df₂, λ) > F_C], where F^′(df₁, df₂, λ) denotes a noncentral F random variable with degrees of freedom (df₁, df₂) and noncentrality parameter λ, and β denotes the probability of a Type II error.

Determination of sample size

There are several approaches to determining the sample size, including Bayesian and frequentist methods that focus on estimation instead of hypothesis testing. However, the most popular approach involves calculating the power of a statistical test, that is, the probability of rejecting H₀ when H₁ is true.

Required sample size for two groups

Let us assume that we want to determine the sample size to detect differences between two groups. Hypothesis H₀ : β₁₁ = 0 is rejected if the estimator of β₁₁ exceeds the critical value $ \left({\hat{\upbeta}}_{11}>c\right) $. In accordance with Amatya, Bhaumik, and Gibbons (2013), this value defines the limit between the acceptance and rejection regions and is set under the following two conditions:

$$ P\left(\hat{\beta_{11}}>c=0+{Z}_{1-\left(\alpha /2\right)}\sqrt{{\left({Np}_1{p}_2\right)}^{-1}{\sigma}_{b1}^2}|{\mathrm{H}}_0 true\operatorname{}\right)=\alpha, $$

(37)

$$ P\left(\hat{\beta_{11}}>c={\beta}_{11}-{Z}_{1-\beta}\sqrt{{\left({Np}_1{p}_2\right)}^{-1}{\sigma}_{b1}^2}|{\mathrm{H}}_1 true\operatorname{}\right)=1-\beta . $$

(38)

Equating Eqs. 37 and 38, since the critical value c is assumed identical under both statistical hypotheses, and solving for N, we obtain the formula that informs us of the sample size required in order to achieve the desired power (see Appendix 6). Specifically,

$$ N=\frac{{\left({Z}_{1-\left(\alpha /2\right)}+{Z}_{1-\upbeta}\right)}^2{\sigma}_{b1}^2}{\upbeta_{11}^2{p}_1{p}_2} $$

(39)

where Z_{1 − (α/2)} and Z_1 − β are 100 (1 − α/2) and 100 (1 − β) percentiles of the standard normal distribution for a bilateral test.

Required sample size for multiple groups

Determining the sample size needed to compare the trends of an arbitrary number of groups is a relatively simple procedure, but one that is seldom documented in longitudinal studies. For this purpose, Eq. 39 can be rewritten as

$$ N=\frac{{\left({Z}_{1-\left(a/2\right)}+{Z}_{1-\upbeta}\right)}^2}{1/\mathrm{Tr}\left[{\left({\mathbf{AVA}}^{\prime}\right)}^{-1}{\left({\mathbf{C}\mathbf{BA}}^{\prime}\right)}^{\prime }{\left({\mathbf{C}\mathbf{P}}^{-1}{\mathbf{C}}^{\prime}\right)}^{-1}\left({\mathbf{C}\mathbf{BA}}^{\prime}\right)\right]}, $$

(40)

where P = diag(p₁, p₂, …, p_J). The remaining terms have been defined previously.

Required sample size for two or more groups with unequal variances

The sample size calculation specified in Eq. 39 assumes homogeneous errors at both Levels 1 and 2. When it is suspected that the variance components may differ depending on the participation of subjects in the training program, the required sample becomes:

$$ {N}^{\ast }=\left(\frac{{\left[{Z}_{1-\left(\alpha /2\right)}+{Z}_{1-\upbeta}\right]}^2}{\upbeta_{11}^2{p}_1{p}_2}\right)\left({p}_2{\sigma^2}_{b1}^{(C)}+{p}_1{\sigma^2}_{b1}^{(E)}\right). $$

(41)

As was the case in the homogeneous model, the determination of the sample size in models with heterogeneous variances with an arbitrary number of groups also requires the modification of Eq. 41. For example, the value of N to detect differences among the trends of three groups can be obtained as

$$ {N}^{\ast }=\frac{{\left({Z}_{1-\left(\alpha /2\right)}+{Z}_{1-\beta}\right)}^2}{1/\mathrm{Tr}{\left[{\mathbf{A}\mathbf{V}}^{\ast }{\mathbf{A}}^{\prime}\right]}^{-1}{\left({\mathbf{C}\mathbf{BA}}^{\prime}\right)}^{\prime }{\left({\mathbf{C}\mathbf{P}}^{-1}{\mathbf{C}}^{\prime}\right)}^{-1}\left({\mathbf{C}\mathbf{BA}}^{\prime}\right)}, $$

(42)

where V^{^*·} = p₁V₁ + p₂V₂ + p₃V₃. If it is suspected that treatment groups are unbalanced, then V^{^*·} = [(p₂p₃)/p^{^*·}]V₁ + [(p₁p₃)/p^{^*·}]V₂ + [(p₁p₂)/p^{^*·}]V₃, with p^{^*·} = p₁p₂ + p₁p₃ + p₂p₃.

Required sample size for missing data

So far we have focused on how to determine the sample size assuming complete cases. However, dropout (also called attrition) is an inevitable problem in most longitudinal studies. The occurrence of missing values can produce biased estimates and can reduce statistical power, leading to inefficient analyses and invalid conclusions. When the rate of attrition is anticipated, a required sample size may be calculated on the basis of the final number of subjects that are expected to complete the study.

In the case of missing data, the formula described above to calculate the variance in the slopes of the subjects, $ {\sigma}_b^2= Var\left(\hat{b_i}\right) $, may no longer be applicable or may not be realistic (Fitzmaurice et al., 2011). For this reason, we need a solution that mitigates the negative impact exerted by the attrition of the sample on the validity of the inferences and of the conclusions reached.

A method for modeling early leaving of a study reasonably is to divide, element by element, the V_i matrix of Eq. 32 by the matrix that identifies the missing data pattern L. In this regard, O’Kelly and Ratitch (2014) clarified that in studies related to the health area it is more common for subjects to drop out of the study prematurely than temporarily. In this situation—that is, of attrition or dropping out definitively—the variance of the estimator rate of change can be obtained from the appropriate diagonal element of

$$ Cov\left({b}_i^{\ast}\right)={\left[{\mathbf{Z}}_i^{\prime}\left({\mathbf{V}}_i^{-1}\varnothing \mathbf{L}\right){\mathbf{Z}}_i\right]}^{-1} $$

(43)

where ∅ denotes the operator of the Hadamard division.

The choice of L matrix will depend on the loss model that we wish to emphasize. However, if we are interested in modeling the pattern of missingness found most frequently in applied research—that is, the monotone—a reasonable choice of L matrix would be one in which each element of the main diagonal informs us of the proportion of subjects who remain in the study over time (i.e., 1, r, r², . . . , r^t–1), and the remaining elements of the assumed survival rate (i.e., r). For the homogeneous model, the suggested procedure provides results similar to those obtained using the method described in Hedeker, Gibbons, and Waternaux (1999).

Method

Theoretical and Monte Carlo studies were conducted in order to determine the optimal sample size (N) for a study that ensures adequate statistical power for rejecting the null hypothesis of β₁₁ = 0, as well as the accuracy of the estimates, assuming homogeneous (V₂ = V₁) or heterogeneous (V₂ = 2V₁) group variances at each of the levels of the model and missing data due to subject dropouts before the completion of the study after baseline. For this purpose we proceed as follows. Initially, using the formulas derived in Eqs. 38 and 40 we carried out a theoretical study to examine the effect of heterogeneity and attrition on determining the appropriate N when the significance level α = 0.05 and the nominal statistical power 1 − β = 0.80. Five factors were manipulated and completely crossed in the study for a total of 108 investigated conditions: reliability of measurement at the first time point (ρ₁ = 0.1, 0.5), Level 2 residual correlation (r₁ = − 0.5, 0, 0.5), number of repeated measurements (T = 4, 8), proportion of imbalance between the group sizes (Δ = 0.5, 0.35, 0.2), and standardized effect size at the last time point (d_L = 0.4, 0.5, 0.6). According to Cohen (1988), standardized mean differences of 0.2, 0.5, and 0.8 correspond to small, medium, and large magnitudes of an effect, respectively. The ratio between the variances of the outcomes at the end and at the beginning of the study remained constant (k₁ = 25) under each of the conditions. Later, a Monte Carlo study was carried out to verify the statistical power achieved with the estimated sample sizes.

Data generation

Datasets were simulated on the basis of the two-level model shown in Eqs. 1–3. At the first level, a continuous outcome was generated as a linear function of time. The intercept and one Level 1 variable were simulated to vary randomly as a function of treatment at the second level. Each explanatory variable X and W was generated to be standard normal. Later, we dichotomized the W variable by an arbitrary threshold (i.e., the mean of all observed data). The error terms were generated as independent normal random variables with means zero and the variances obtained from the values specified above for the manipulated factors. We used SAS version 9.4 (SAS, 2016) for the simulations.

For each of the 108 investigated conditions, 1,000 sets of raw data were generated and analyzed during the simulation process. In our simulation study, two different situations were considered: with no missing data at each of the time points and time-related dropout with cumulative missing data rates of 27% at the fourth occasion and 52% at the eighth occasion. Both with complete and with missing data, the analyses were carried out twice by REML methods using SAS PROC MIXED, once assuming homogeneity and once modeling the variances, in order to investigate the results of incorporating heterogeneity into the models.

Here we will focus on sample size determination in the presence of a monotone missing data pattern that spans the missing-at-random (MAR) model. For our dropout MAR mechanism, the data point for subject i was missing at time t and the subsequent times if U_it < Φ[λ_t + Y_{i(t − 1)}], where U_it is a uniform random variable and Φ is the cumulative normal distribution function. The values of λ_t in the above mechanisms were chosen to yield time-related dropout rates of 0%, 10%, 19%, and 27% for the four respective occasions, and time-related dropout rates of 0%, 10%, 19%, 27%, 34%, 41%, 47%, and 52% for the eight respective occasions.

Evaluation criteria

To determine the accuracy and precision of the strategies being compared (i.e., sample size calculations using derived formula based on OLS estimates and simulations based on REML estimates), we examined their performance in terms of the following quantities:

1.
Relative bias To find out whether a parameter tends to be over- or underestimated, the relative bias index was used in this study. If the parameter of interest was φ=(1-β), the percentage relative bias was $ 100\times \left[\left(E\left(\hat{\phi}\right)-\phi \right)/\phi \right] $, where $ E\left(\hat{\phi}\right) $ was computed as the average parameter estimate across valid replications. We have not been able to find any formal criteria in the literature for when a relative bias is too big, so in this article, a relative bias less than 10% was considered acceptable.
2.
Approximate 95% coverage rates This refers to the number of times that the absolute difference between the theoretical and empirical power across the examined conditions falls outside of approximately two standard errors (SE). The SEs reported for the empirical estimates of power were estimated by $ \sqrt{pq/m} $, where p is the theoretical probability of a Type II error, q equals 1–p, and m is the number of simulations carried out in the numerical experiment.

Results

Tables 1, 2, 3, and 4 show the sample sizes obtained by the proposed method to achieve theoretical power of at least 80% and the simulation-based empirical power estimates. Table 1 gives the results for complete data with homogeneous variances, Table 2 gives the results for complete data with heterogeneous variances, Table 3 gives the results for incomplete data with homogeneous variances, and Table 4 gives the results for incomplete data with heterogeneous variances. Hereafter, these are known as Scenarios A, B, C, and D, respectively.

Table 1 Sample sizes to obtain theoretical power of at least 80% and the empirical power, with complete data and homogeneous Level 1 and 2 variances

Full size table

Table 2 Sample sizes to obtain theoretical power of at least 80% and the empirical power, with complete data and heterogeneous Level 1 and 2 variances

Full size table

Table 3 Sample sizes to obtain theoretical power of at least 80% and the empirical power, with incomplete data and homogeneous Level 1 and 2 variances

Full size table

Table 4 Sample sizes to obtain theoretical power of at least 80% and the empirical power, with incomplete data and heterogeneous Level 1 and 2 variances

Full size table

As can be seen from Table 1, the sample size needed to achieve 80% power with a two-sided Type I error rate of 5% decreases substantially with small increases in the effect size at the last time point (d_L), whereas the influences of the number of repeated measurements (T), the Level 2 residual correlation (r₁), and the reliability of measurement at the first time point (ρ₁) are not so obvious. Although the effects of T, r₁, and ρ₁ are relatively small on statistical power, larger values of these factors show a positive effect on statistical power. It is also shown in Table 1 that the sample size increases with an increasing degree of imbalance between the group sizes. In fact, high levels of imbalance (i.e., Δ = .2) cause a notable increase in the sample size needed to maintain a specific statistical power of 80%. A similar tendency is observed for the same conditions under the remaining scenarios (i.e., B, C, and D).

Table 2 presents the results for complete data in the presence of heterogeneity of variances (Scenario B). When the sample size estimates of Table 1 are compared to those of Table 2, we find that the mere presence of a small degree of heterogeneity in the Level 1 and 2 random effects (V₂ = 2V₁) leads to a noticeable increase in the sample size necessary to achieve at least 80% power, even when the group sizes are equal. Table 3 lists the necessary sample sizes to reach the preset value of power when the assumption ofthe homogeneity of Level 1 and 2 variances is satisfied but attrition is present (Scenario C). As we stated previously, in this study we have assumed that the dropout rate of subjects from baseline to the last time point of interest is 10% in each group. As compared to the case of equal variances and complete data (Scenario A), it may be observed that dropout rates of 10% over time require that the sample size increase by 20%–25% in order to reach a similar power. Finally, the sample sizes required to accommodate the dropout rate in the presence of heterogeneity of variances (Scenario D) are given in Table 4. All results displayed in this table agree with the previous findings from a qualitative point of view; however, as one would expect, a larger sample is required under this scenario to reach the same level of power.

Table 5 shows the percentages of relative bias by ρ₁, d_L, and T, collapsed across Level 2 residual correlations (r₁). The results yielded negligible levels of bias (less than ±0.05% to 1.5% of the true population parameter, on average) in the vast majority of the 108 conditions examined. The levels of bias of predicted theoretical power were always less than 1%, regardless of the investigated conditions, whereas the mean relative bias for the empirical estimates of power remained under 3.6% in all cells, and it exceeded 3% in only five cases. In fact, there were no statistically significant differences in bias for the power estimates in any of the simulated conditions.

Table 5 Percentages of relative bias for predicted theoretical and empirical powers

Full size table

The empirical estimates of power can also be compared to the theoretical values stated in Tables 1, 2, 3 and 4. The highest absolute difference was .024 among the 108 conditions displayed in Table 1, .026 among the 108 conditions displayed in Table 2, .039 among the 108 conditions displayed in Table 3, and .038 among the 108 conditions displayed in Table 4. Under Scenarios A and B, the discrepancies between theoretical prediction and empirical results are negligible, since 99% of the power estimates fall within two standard deviation limits (i.e., between .775 and .825). On the other hand, our results also indicate that, for Scenarios C and D, about 85% of power estimates fall within the confidence intervals when T = 4, while only 5% of absolute differences were beyond two standard deviations when T = 8. Therefore, the derived formulas allow the user to rigorously determine the sample size required to yield a certain power for both complete and incomplete data, both assuming homogeneity and when incorporating heterogeneity into the multilevel model.

Empirical illustration using two real longitudinal data examples

To illustrate how the derived formulas for sample size calculations that can be used for a study ensure adequate power to detect statistical significance under different models and conditions (e.g., linear and quadratic, homogeneous and heterogeneous, or complete and missing data), we rely on the data of two longitudinal studies carried out by Núñez, Rosário, Vallejo, and González-Pienda (2013) and Rosário et al. (2017). In the first study, a linear change model was a reasonable assumption, whereas in the second study a quadratic model provides a more suitable choice to represent the shape of change. Consistent with common practice in empirical applications of growth curve models, the Level 1 predictors (i.e., Time and/or Time²) are assumed to be free of measurement error; if errors do exist, they would generally attenuate the estimate of the regression coefficients relative to their population values.

The first example (Núñez et al., 2013) examined the effectiveness of a school-based mentoring program on student self-regulated learning strategies. In this study program effects were tested in 94 sixth grade students assigned randomly to two experimental conditions, evaluated at the beginning of the study and after 3, 6, and 9 months. Thus, if we measure the passage of time quarterly, this design involves f = 1 (the frequency of observation per unit of time is equal to one), D = 3 (the study lasts three quarters), and T = fD + 1 (the number of measurement occasions is four).

After reanalyzing the data of Núñez et al. (2013), without assuming that the groups’ average responses are equal at baseline and using SAS PROC MIXED, the following estimates were obtained: $ {\hat{\tau}}_{00}=.0708,{\hat{\tau}}_{01}=.0048,{\hat{\tau}}_{11}=.0050,{\hat{\sigma}}^2=0.865,{\hat{\beta}}_{01}=.1169\kern0.28em \mathrm{and}\hat{\beta_{11}}=.0804 $. Here, time was treated as a continuous variable centered on its overall mean, rather than as a classification variable, as in the original study. Substituting these estimates in Eqs. 13–16, yields estimates of the reliability of measurement at the first time point $ \left(\hat{\rho_1}=.45\right) $ standardized effect size at the last time point $ \left({\hat{d}}_L=.75\right) $, proportion of variance of outcomes between the first and the last time points $ \left({\hat{k}}_1=1.47\right) $ and slope-intercept correlation $ \left({\hat{r}}_1=.25\right). $ In turn, using Eqs. 30 and 34 the variance of the slope $ \left({\hat{\sigma}}_{b1}^2=.0223\right) $. and the non-centrality parameter $ \left(\hat{\lambda}=6.81\right) $ are estimated. Inspection of a table for the noncentral F distribution (see, e.g., Ato & Vallejo, 2015) at the .05 significance level with $ \left(\hat{\lambda}=6.81\right) $ and with (1,280) degrees of freedom yields a power of $ \hat{\varphi}\cong .74 $. Also, standard software (e.g., SAS PROC IML) can be used to estimate this value. Next, we removed 28 data points to yield approximate dropout rates of 0%, 5%, 9%, and 13% for the four time points. In this particular application, the variance of the slope, $ {\hat{\sigma}}_{b1}^2 $ was .0246 and the non-centrality parameter, $ \hat{\lambda} $, 6.18. Using these results and tables of noncentral F distribution, the power is found to be approximately .70. The corresponding estimates of $ {\sigma}_{b1}^2,\lambda $ and φ with heterogeneous errors (ratio 1:3) were .0446, 3.4, and .45, respectively.

Given that, in all three cases described, a power below the often-mentioned benchmark of .80 (Cohen, 1988) was obtained, it was necessary to determine the new sample size that would have allowed us to replicate the differences between treatment conditions, with respect to their average linear growth rates, under each of the situations described. From Eq. 39, with Z_{1 − (α/2)} = 1.96 and Z_1 − β = .84, we see that the total sample sizes needed to achieve 80% power with a 5% significance level were 109, 120, and 217, respectively. So far, we have only considered power results for comparing groups on linear rates of change. Yet the rate of change can also be nonlinear.

Next we considered data from the longitudinal randomized design, conducted by Rosário et al. (2017) with 182 fourth grade students, to examine whether the students’ writing quality differed when they wrote journals on a weekly basis, as compared with a control group. In the study, the subjects were measured at baseline and weekly for up to 12 weeks. With regard to the quality of writing compositions, Rosário et al. found that providing extra writing opportunities (i.e., writing journals) had a statistically significant impact on instantaneous rate of change at one specific moment and curvature. We suppose that our interest would lie in replicating the difference in the average acceleration rates between the two groups. Thus, we will first check whether there is sufficient statistical power to detect the described effects.

As in the previous example, we briefly considered three cases: a complete set of data with homogeneous errors; an incomplete set of data with homogeneous errors; and a complete set of data with heterogeneous errors. After analyzing the data using PROC MIXED, the following estimates were obtained: $ {\hat{\tau}}_{00}=45.0677,{\hat{\tau}}_{01}=1.0519,{\hat{\tau}}_{11}=.3254,{\hat{\tau}}_{02}=.2867,{\hat{\tau}}_{21}=.0081,{\hat{\tau}}_{22}=.0081,{\hat{\sigma}}^2=21.1842,{\hat{\beta}}_{11}=.2238, and{\hat{\beta}}_{11}=.2238,\mathrm{and}{\hat{\beta}}_{21}=-.0446 $. Substituting these estimates into Eqs. 13, 20–23, 32, and 36, the indices and parameter estimates can be calculated as $ {\hat{\rho}}_1=.6802,{\hat{d}}_Q=-.3106,{\hat{k}}_1=1.3262,{\hat{k}}_2=2.1824,{\hat{r}}_1=-.2747,{\hat{r}}_2=-.4756,{\hat{r}}_{12}=-.1574,{\hat{\sigma}}_{b2}^2=.0186,\mathrm{and}\;\hat{\lambda}=4.8533 $. Inspection of noncentral F tables at the .05 significance level with $ \hat{\lambda}=4.8533 $ and with (1, 2178) degrees of freedom yields a power of $ \hat{\varphi}\cong .60 $. Removing 594 data points from the original study according to a monotone dropout pattern, which represents a 5% dropout, we obtained $ {\hat{\sigma}}_{b2}^2=0.257 $, $ \hat{\lambda}=3.5096 $, and $ \hat{\varphi}\cong 0.47 $. In the presence of heterogeneity of variances (ratio 1:3), however, we obtained $ {\hat{\sigma}}_{b2}^2=0.341 $, $ \hat{\lambda}=2.4267 $, and $ \hat{\varphi}=.34 $. According to the convention suggested by Cohen (1988), in all three cases an unsatisfactory level of statistical power was obtained. Thus, it was necessary to calculate the sample size that would have allowed us to replicate the differences between treatment conditions, with respect to their average acceleration rates, under each of the situations described. From Eq. 39, with Z_{1 − (α/2)} = 1.96 and Z_1 − β = .84, we established that the total sample sizes needed to ensure adequate power were 295, 408, and 589, respectively.

Although we have omitted the original data due to limitations of space, the databases for the two examples are available from the first author upon request, and Appendix 7 provides the SAS codes used to perform the sample size and power calculations for Examples 1 and 2.

Discussion and conclusion

Sample size calculations to provide specified power levels were performed in four different scenarios, each involving 108 treatment combinations, through the use of mathematical formulas and numerical simulations. Our results indicate that both the analytic and empirical method provide virtually identical estimates of power across all examined conditions. The empirical estimates were below the theoretical estimates in 124 of the 432 cells of the design (28.7%), but the differences were practically insignificant. As we mentioned above, the mean relative bias for the empirical estimates of power remained under 3.6% in all cells and, with few exceptions, the estimates of power fall inside the boundaries of a 95% confidence interval for the theoretical values, suggesting that the trend described above is due to chance. Consistent with the results of Heo et al. (2013), the data indicate that the derived formulas of power are well-validated by simulation studies, which show that the values of theoretical power are very close to those of the empirical power.

In Scenario A, in which complete data across time and homogeneous variances were available, our results revealed that the effect size and a large degree of imbalance between group sizes had decisive impacts on the sample size determination. For instance, when the groups had markedly different sizes (i.e., one group was four times the size of the other), the sample size was required to increase by approximately 50% in order to achieve the same power as in the balanced case; whereas, for an effect size of .40, the sample size that was required to achieve a power comparable to an effect size of .60 was close to a 100% increase. Therefore, careful attention should be paid with regard to the choice among possible population effect sizes and unequal randomization when planning a study. A conservative approach would be to consider the most plausible and choose the smallest effect size among them. On the other hand, the effect of the correlation of the Level 2 residuals and the reliability of measurement at the first time point was not trivial, but the consequences were much less severe. As compared with other similar studies, these results match, to a large degree, the numerical results reported by Usami (2014) using a method proposed by Satorra and Saris (1985) in the context of structural equation modeling.

In the remaining scenarios, our two main findings can be summarized as follows. Firstly, in the presence of heterogeneity in the Level 1 and 2 random effects, larger sample sizes are required in order to obtain the desired nominal power, even for complete and balanced data. One important caveat is that the results were only obtained by the proposed method under positively paired conditions. A positive pairing implies that the treatment condition that has the smallest number of subjects is associated with the smallest variance, whereas the opposite occurs for a negative pairing. Unfortunately, with an unbalanced design similar to that employed in our work (Livacic-Rojas, Vallejo, Fernández, & Tuero, 2017; Vallejo et al., 2008), the tendency to be conservative is worse under negatively paired conditions. The second finding is that, when there is attrition, sample size requirements can be quite large. As one can easily imagine, however, it is not clear what is sufficiently large with regard to sample size in order to make valid inferences about the parameter of interest. In many cases an increase of 5% or 10% may be sufficient, but depending on the expected rate of attrition, the appropriate percentage could vary. In the present study we observed that with dropout rates of 10% at every time point (e.g., a condition with eight time points would retain approximately 50% of the original sample at the last time point), the sample size would be required to increase by 20%–25% in order to reach a power that was equivalent to the case of complete data. In any case, when attrition is anticipated, the formulas we derived allow the power to be calculated on the basis of the final number of subjects that are expected to complete the study.

Although the numerical results may change slightly depending on the statistical package and the number of iterations or the algorithm used to estimate the parameters, the simulations presented in this article strongly suggest that on the whole the empirical power based on REML estimates is in fairly good agreement with the theoretical power based on OLS estimates. However, it has also become clear from the present study that, with complex statistical models, sample size estimation using simulations may be needed. One reason why the Monte Carlo power method may be preferred over a theoretical method in some cases is because of its great flexibility to be applied to almost any kind of data, regardless of whether all the model assumptions are satisfied, the type of covariates present, and the attrition rate expected. In fact, the sample size calculation through simulation can easily be extended to more complex linear mixed models or generalized linear mixed models, both univariate and multivariate.

Recommendations

As we noted earlier, when performing a prospective power analysis and no information is available regarding the growth model parameters, researchers may explicitly specify parameters by indirectly setting four types of indices (ρ₁, k₁, d_L, r₁) for a linear trend. In some cases, this is a reasonable approximation, but in other cases it may become a tricky task. Hence, a range of values often need to be considered.

1.
Reliability (ρ₁) depends on what measure is being used. The reader should note, however, that questionnaire measures, which represent one of the most important tools available for data collection in the educational and social sciences, appear to have relatively low reliability. Hence, reliabilities in the .4–.7 range would provide a reasonable starting point when planning research.
2.
Empirical studies have indicated that, under most situations likely to be encountered by behavioral science researchers, the ratio between the variances of the outcomes at the end and at the beginning of a study (k₁) could be more than five times smaller than the value we have examined (cf. Hertzog, Lindenberger, Ghisletta, & von Oertzen, 2008). Thus, the sample size requirements will be less demanding than those shown in the tables.
3.
The average effect size (d_L) found in published meta-analyses in psychology is around d_L = 0.50 (see Bakker, van Dijk, & Wicherts, 2012). An effect size in the range of 0.4–0.6 is regarded as typical. We have not been able to find any guidelines on how to select these effect sizes for a quadratic growth model. Although this issue is an open question and should be investigated, provisionally we have assumed an effect size of one-half of a standard deviation unit for the rate of acceleration (i.e., d_Q = 0.50).
4.
Although the correlation between the starting point and the rate of change over time (r₁) is not known, precisely different authors (Hertzog et al., 2008; Hox, 2010) have suggested that it is unlikely that this correlation would reach values close to zero in a given population. Hence, correlations in the .25–.50 range would be values that are reasonable to choose when planning a longitudinal study.

Finally, for completeness, three caveats are included. First, it should be clear that the sample size requirements to detect an intervention effect are study-specific. Second, although longitudinal studies often involve small samples, it is very important to emphasize that large samples sizes make small effect sizes detectable. Therefore, researchers interested in carrying out studies that have sufficient power to reject the null hypothesis should avoid using small sample sizes whenever possible. This is especially the case when they are unable to specify a minimum effect size that would have either practical or theoretical significance. Third, it should be noted that the reliabilities studied (i.e., .1 and .5) are on the low side. Since unreliability affects statistical power, it becomes obvious that more positive results should be obtained with higher initial reliabilities. If reliability is improved to .80, for example, the potential reduction in sample size that could be achieved would be approximately 20%. Hence, researchers should make an effort to reduce the effects of measurement error.

Limitations of this study

In our simulation study we saw that the theoretical power values based on the sample size formulas derived using the OLS estimates were nearly identical to the empirical power based on the ML estimates, even with a combination of heterogeneous variances and missing data. However, readers should note that the generalization of our results is limited to situations in which the mechanism for missing data is MAR. When missing data due to attrition are driven by an MAR mechanism, the standard likelihood-based method provides valid inferences about differences in growth rates between groups. In contrast, when the missing data are not MAR (NMAR), the likelihood-based method yields erroneous inferences (failure to control the Type I error rate and to provide altered power). Thus, caution should be exercised if the missingness is thought to be NMAR. To improve the validity of estimates, it is recommended that researchers determine why data are missing and build models that include covariates that may be predictive of dropping out.

An additional limitation of our study is that the results and recommendations are based on assuming normality for the continuous outcome variable. The effect of nonnormality on the power would not be of much consequence in the case of near-normal populations. However, the presence of a fair degree of skewness and/or kurtosis, as is not uncommon in educational and psychological studies (see, e.g., Blanca, Arnau, López-Montiel, Bono, & Bendayan, 2013; Cain,Zhang, & Yuan, 2017; Micceri, 1989), would lead to a more conservative alpha level and, thus, to more demanding sample size requirements.

Finally, for computational simplicity, we assumed that the model used only included one categorical variable (e.g., the program studied). However, it is possible to increase precision in the estimation of treatment effects if effective covariates are used in the design. In fact, continuous variables are sometimes included in longitudinal studies as predictors or baseline covariates. In general, as long as the covariates are independent of the group assignments and do not modify the group effects, making an adjustment for baseline response will increase statistical power, because it can be expected that the adjustment will reduce the between- and within-subjects variability.

Author note

We are grateful to the Editor, Associate Editor Wei Wu, and reviewers for their constructive suggestions on an draft of this manuscript.

This work has been funded by the Spanish Ministry of Science and Innovation (Ref: PSI-2015-67630-P) and the Chilean National Fund for Scientific and Technological Development (FONDECYT. Ref.: 1170642).

References

Amatya, A., Bhaumik, D., & Gibbons, R. D. (2013). Sample size determination for clustered count data. Statistics in Medicine, 32, 4162–4179.
Article PubMed PubMed Central Google Scholar
Ato, M., & Vallejo, G. (2015). Diseños de investigación en psicología [Research designs in psychology]. Madrid, Spain: Pirámide.
Google Scholar
Bakker, M., van Dijk, A., & Wicherts, J. M. (2012). The rules of the game called psychological science. Perspectives on Psychological Science, 7, 543–554. https://doi.org/10.1177/1745691612459060
Article PubMed Google Scholar
Blanca, M. J., Arnau, J., López-Montiel, D., Bono, R., & Bendayan, R. (2013). Skewness and kurtosis in real data samples. Methodology, 9, 78–84.
Article Google Scholar
Browne, W. J., & Draper, D. (2000). Implementation and performance issues in the Bayesian and likelihood fitting of multilevel models. Computational Statistics, 15, 391–420.
Article Google Scholar
Cain, M. K., Zhang, Z., & Yuan, K. H. (2017). Univariate and multivariate skewness and kurtosis for measuring nonnormality: Prevalence, influence and estimation. Behavior Research Methods 49, 1716–1735.
Article PubMed Google Scholar
Cohen, J. (1988). Statistical power for the behavioral sciences (2nd ed.). Hillsdale, NJ: Erlbaum.
Google Scholar
Erceg-Hurn, D. M., & Mirosevich, V. M. (2008). Modern robust statistical methods: An easy way to maximize the accuracy and power of your research. American Psychologist, 63, 591–601.
Article PubMed Google Scholar
Fitzmaurice, G., Laird, N., & Ware, J. (2011). Applied longitudinal analysis (2nd ed.). Hoboken, NJ: Wiley.
Book Google Scholar
Grissom, R. J., & Kim, J. J. (2012). Effect sizes for research: Univariate and multivariate applications (2nd ed.). New York, NY: Routledge
Google Scholar
Hedeker, D., Gibbons, R. D., & Waternaux, C. (1999). Sample size estimation for longitudinal designs with attrition. Journal of Educational and Behavioral Statistics, 24, 70–93.
Article Google Scholar
Heo, M. (2014). Impact of subject attrition on sample size determinations for longitudinal cluster randomized clinical trials. Journal of Biopharmaceutical Statistics, 24, 507–522.
Article PubMed PubMed Central Google Scholar
Heo, M., & Leon, A. C. (2008). Statistical power and sample size requirements for three level hierarchical cluster randomized trials. Biometrics, 64, 1256–1262.
Article PubMed Google Scholar
Heo, M., & Leon, A. C. (2009). Sample size requirements to detect an intervention by time interaction in longitudinal cluster randomized clinical trials. Statistics in Medicine, 28, 1017–1027.
Article PubMed PubMed Central Google Scholar
Heo, M., Xue, X., & Kim, M. Y. (2013). Sample size requirements to detect an intervention by time interaction in longitudinal cluster randomized clinical trials with random slopes. Computational Statistics and Data Analysis, 60, 169–178.
Article PubMed Google Scholar
Hertzog, C., von Oertzen, T., Ghisletta, P., & Lindenberger, U. (2008). Evaluating the power of latent growth curve models to detect individual differences in change. Structural Equation Modeling, 15, 541–563. https://doi.org/10.1080/10705510802338983
Article Google Scholar
Hox, J. J. (2010). Multilevel analysis. Techniques and applications (2nd ed.). New York, NY: Routledge.
Book Google Scholar
Keselman, H. J., Algina, J., Lix, L. M., Wilcox, R. R., & Deering, K. N. (2008). A generally robust approach for testing hypotheses and setting confidence intervals for effect sizes. Psychological Methods, 13, 110–129.
Article PubMed Google Scholar
Livacic-Rojas, P. E., Vallejo, G., Fernández, M. P., & Tuero, E. (2017). Power of modified Brown–Forsythe and mixed-model approaches in split-plot designs. Methodology, 13, 9–22.
Article Google Scholar
Micceri, T. (1989). The unicorn, the normal curve, and other improbable creatures. Psychological Bulletin, 92, 778–785.
Google Scholar
Muthén, B. O., & Curran, P. J. (1997). General longitudinal modeling of individual differences in experimental designs: A latent variable framework for analysis and power estimation. Psychological Methods, 2, 371–402. https://doi.org/10.1037/1082-989X.2.4.371
Article Google Scholar
Núñez, J. C., Rosário, P., Vallejo, G., & González-Pienda, J. A. (2013). A longitudinal assessment of the effectiveness of a school-based mentoring program in middle school. Contemporary Educational Psychology, 38, 11–21.
Article Google Scholar
O’Kelly, M., & Ratitch, B. (2014). Analysis under missing-not-at-random assumptions. In M. O’Kelly & B. Ratitch (Eds.), Clinical trials with missing data: A guide for practitioners (pp. 257–368). New York, NY: Wiley.
Chapter Google Scholar
Puttaswamy, T. K. (2012). Mathematical achievements of pre-modern Indian mathematicians. New York, NY: Elsevier.
Google Scholar
Raudenbush S. W., & Liu, X. (2001). Effects of study duration, frequency of observation, and sample size on power in studies of group differences in polynomial change. Psychological Methods, 6, 387–401.
Article PubMed Google Scholar
Rosário, P., Högemann, J., Nuñez, J. C., Vallejo, G., Cunha, J, Oliveira, V.,… Rodríguez, C. (2017). Writing week-journals to improve the writing quality of fourth-graders’ compositions. Reading and Writing, 30, 1001–1032.
Google Scholar
Roy, A., Bhaumik, D. K., Aryal, S., & Gibbons, R. D. (2007). Sample size determination for hierarchical longitudinal designs with differential attrition rates. Biometrics, 63, 699–707.
Article PubMed Google Scholar
SAS Institute, Inc. (2016). SAS/STAT® 14.2 user’s guide. Cary, NC: SAS Institute, Inc.
Satorra, A., & Saris, W. E. (1985). The power of the likelihood ratio test in covariance structure analysis. Psychometrika, 50, 83–90.
Article Google Scholar
Shieh, S. (2003). A comparative study of power and sample size calculations for multivariate general linear models. Multivariate Behavioral Research, 38, 285–307.
Article PubMed Google Scholar
Usami, S. (2014). A convenient method and numerical tables for sample size determination in longitudinal-experimental research using multilevel models. Behavior Research Methods, 46, 1207–1219. https://doi.org/10.3758/s13428-013-0432-0
Article PubMed Google Scholar
Vallejo, G., Ato, M., Fernández, M. P., Livacic-Rojas, P. E., & Tuero-Herrero, E. (2016). Power analysis to detect the treatment effect in longitudinal studies with heterogeneous errors and incomplete data. Psicothema, 28, 330–339.
PubMed Google Scholar
Vallejo, G., Ato, M., & Valdés, T. (2008). Consequences of misspecifiying the error covariance structure in linear mixed models for longitudinal data, Methodology, 4, 10–21.
Article Google Scholar
Vallejo, G., Fernández, M. P., Cuesta, M., & Livacic-Rojas, P. E. (2015). Effects of modeling the heterogeneity on inferences drawn from multilevel designs. Multivariate Behavioral Research, 50, 75–90.
Article PubMed Google Scholar
Wänström, L. (2009). Sample sizes for two-group second-order latent growth curve models. Multivariate Behavioral Research, 44, 588–619.
Article Google Scholar
Willett, J. B. (1988). Questions and answers in the measurement of change. In E. Z. Rothkopf (Ed.), Review of research in education (Vol. 15, pp. 345–442), Washington, DC: American Educational Research Association.
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Psychology, Universidad de Oviedo, Oviedo, Spain
Guillermo Vallejo & M. Paula Fernández
Department of Psychology, Universidad de Murcia, Murcia, Spain
Manuel Ato
Department of Psychology, Universidad de Santiago de Chile, Santiago de Chile, Chile
Pablo E. Livacic-Rojas

Authors

Guillermo Vallejo
View author publications
You can also search for this author in PubMed Google Scholar
Manuel Ato
View author publications
You can also search for this author in PubMed Google Scholar
M. Paula Fernández
View author publications
You can also search for this author in PubMed Google Scholar
Pablo E. Livacic-Rojas
View author publications
You can also search for this author in PubMed Google Scholar

Corresponding author

Correspondence to Guillermo Vallejo.

Appendices

Appendix 1

$$ {\displaystyle \begin{array}{l}\begin{array}{l} Var\;\left({Y}_{ij}|{X}_{1 ij},\dots, {X}_{pi j}\right)=E{\left[{Y}_{ij}-E\left({Y}_{ij}\right)\right]}^2=E{\left({Y}_{ij}\right)}^2-E{\left[\left({Y}_{ij}\right)\right]}^2\\ {}=E{\left({u}_{0j}+\sum \limits_{p=1}^P{u}_{pj}{X}_{pi j}+{e}_{ij}\right)}^2\\ {}=E\left({u}_{0j}^2+\sum \limits_{p=1}^P{u}_{pj}^2{X}_{pi j}^2+2\sum \limits_{p=1}^P{u}_{0j}{u}_{pj}{X}_{pi j}+{e}_{ij}^2\right)\\ {}=E\left({u}_{0j}^2+\sum \limits_{p=1}^P{X}_{pi j}^2E\left({u}_{pj}^2\right)+2\sum \limits_{p=1}^P{X}_{pi j}E\left({u}_{0j}{u}_{pj}\right)+E\left({e}_{ij}^2\right)\right)\\ {}={\tau}_{00}+2\sum \limits_{p=1}^P{\tau}_{0p}{X}_{pi j}+\sum \limits_{p=1}^P{\tau}_{pp}{X}_{pi j}^2+{\sigma}^2\\ {} Cov\left({Y}_{ij},{Y}_i{,}_j\left|{X}_{1 ij},\dots, {X}_{pi j}\right.\right)=E\left[\left({Y}_{ij}-E\left({Y}_{ij}\right)\right)\left({Y}_i{,}_j-E\left({Y}_i{,}_j\right)\right)\right]=E\left({Y}_{ij}{Y}_i{,}_j\right)-E\left({Y}_{ij}\right)E\left({Y}_i,j\right)\\ {}=E\left[\left({u}_{0j}+\sum \limits_{p=1}^P{u}_{pj}{X}_{pi j}+{e}_{ij}\right)\left({u}_{0j}+\sum \limits_{p=1}^P{u}_{pj}{X}_{pi}{,}_j+{e}_i{,}_j\right)\right.\\ {}=\left.E\left({u}_{0j}^2\right)+\sum \limits_{p=1}^P{X}_{pi j}E\left({u}_{0j},{u}_{pj}\right)+\sum \limits_{p=1}^P{X}_{pi}{,}_jE\left({u}_{0j},{u}_{pj}\right)+\sum \limits_{p=1}^P\left({X}_{pi j}{X}_{pi}{,}_j\right)E\left({u}_{pj}^2\right)\right]\\ {}={\tau}_{00}+\sum \limits_{p=1}^P{\tau}_{0p}\left({X}_{pi j}+{X}_{pi}{,}_j\right)+\sum \limits_{p=1}^P{\tau}_{pp}\left({X}_{pi j}{X}_{pi}{,}_j\right)\end{array}\\ {}\operatorname{}\operatorname{}\operatorname{}\\ {}\operatorname{}\operatorname{}\operatorname{}\\ {}\operatorname{}\operatorname{}\operatorname{}\\ {}\operatorname{}\operatorname{}\operatorname{}\end{array}} $$

Appendix 2

Calculation of variance component τ₀₁

$$ {\displaystyle \begin{array}{l}{r}_1=\frac{\tau_{01}}{\sqrt{\tau_{00}{\tau}_{11}}};{k}_1=\frac{\tau_{00}+2D{\uptau}_{01}+{D}^2{\tau}_{11}+{\sigma}^2}{\tau_{00}+{\upsigma}^2}\\ {}\Rightarrow \left|\begin{array}{l}{k}_1\left({\tau}_{00}+{\sigma}^2\right)={\tau}_{00}+2{D\tau}_{01}+{D}^2{\tau}_{11}+{\upsigma}^2\\ {}{\tau}_{11}=\frac{\tau_{01}^2}{r_1^2{\tau}_{00}}\end{array}\right.\\ {}\Rightarrow \left({k}_1-1\right)\left({\tau}_{00}+{\sigma}^2\right)=2{D\tau}_{01}+\frac{D^2{\tau}_{01}^2}{r_1^2{\tau}_{00}}\\ {}\Rightarrow \left({k}_1-1\right)\left({\tau}_{00}+{\sigma}^2\right){r}_1^2{\tau}_{00}=2{D\tau}_{01}{r}_1^2{\tau}_{00}+{D}^2{\tau}_{01}^2\\ {}\Rightarrow {D}^2{\tau}_{01}^2+2{D\tau}_{01}{r}_1^2{\tau}_{00}-\left({k}_1-1\right)\left({\tau}_{00}+{\sigma}^2\right){r}_1^2{\tau}_{00}=0\end{array}} $$

τ₀₁ parameter is one of the roots of the quadratic function

$$ {\tau}_{01}=\left(-{B}_{01}\pm \sqrt{B_{01}^2-4{A}_{01}{C}_{01}}/2{A}_{01}\right) $$

where

$$ {A}_{01}={D}^2,{B}_{01}=2{Dr}_1^2{\tau}_{00}\kern0.24em and\kern0.24em {C}_{01}=-\left({k}_1-1\right)\left({\tau}_{00}+{\sigma}^2\right){r}_1^2{\tau}_{00} $$

Calculation of variance component τ₁₁

$$ {\displaystyle \begin{array}{l}{r}_1=\frac{\tau_{01}}{\sqrt{\tau_{00}{\tau}_{11}}};{k}_1=\frac{\tau_{00}+2D{\tau}_{01}+{D}^2{\tau}_{11}+{\sigma}^2}{\tau_{00}+{\sigma}^2}\\ {}\Rightarrow \left|\begin{array}{l}{k}_1\left({\tau}_{00}+{\sigma}^2\right)={\tau}_{00}+2D{\tau}_{01}+{D}^2{\tau}_{11}+{\sigma}^2\\ {}{\tau}_{01}={r}_1\sqrt{\tau_{00}{\tau}_{11}}\end{array}\right.\\ {}\Rightarrow \left({k}_1-1\right)\left({\tau}_{00}+{\sigma}^2\right)=2{Dr}_1\sqrt{\tau_{00}{\tau}_{11}}+{D}^2{\tau}_{11}\\ {}\Rightarrow {\left(2{Dr}_1\sqrt{\tau_{00}{\tau}_{11}}\right)}^2={\left[\left({k}_1-1\right)\left({\tau}_{00}+{\sigma}^2\right)-{D}^2{\tau}_{01}^2\right]}^2\\ {}\Rightarrow {D}^4{\tau}_{11}^2-4{D}^2{r}_1^2{\tau}_{00}{\tau}_{11}-2\left({k}_1-1\right)\left({\tau}_{00}+{\sigma}^2\right){D}^2{\tau}_{11}+{\left({k}_1-1\right)}^2{\left({\tau}_{00}+{\sigma}^2\right)}^2=0\end{array}} $$

τ₁₁ parameter is one of the roots of the quadratic function

$$ {\tau}_{11}=\left(-{B}_{11}\pm \sqrt{\left.B{}_{11}{}^2 - 4{A}_{11}{C}_{11}\right)}/2{A}_{11}\right. $$

where

$$ {A}_{11}={D}^4,{B}_{01}=-4{D}^2{r}_1^2{\tau}_{00}-2\left({k}_2-1\right)\left({\tau}_{00}+{\sigma}^2\right){D}^2\kern0.24em and\kern0.24em {C}_{11}={\left({k}_1-1\right)}^2{\left({\tau}_{00}+{\sigma}^2\right)}^2. $$

Calculation of variance component τ₀₂

$$ {\displaystyle \begin{array}{l}{r}_2=\frac{\tau_{02}}{\sqrt{\tau_{00}{\tau}_{22}}};{r}_{12}=\frac{\tau_{12}}{\sqrt{\tau_{11}{\tau}_{22}}};{k}_2=\frac{\tau_{00}+2{D\tau}_{01}+{D}^2{\tau}_{11}+2{D}^2{\tau}_{02}+2{D}^3{\tau}_{12}+{D}^4{\tau}_{22}+{\sigma}^2}{\tau_{00}+{\upsigma}^2}\\ {}\Rightarrow \left|\begin{array}{l}{k}_2\left({\tau}_{00}+{\sigma}^2\right)={\tau}_{00}+2{D\tau}_{01}+{D}^2{\tau}_{11}+2{D}^2{\tau}_{02}+2{D}^3{\tau}_{12}+{D}^4{\tau}_{22}+{\sigma}^2\\ {}{\tau}_{12}={r}_{12}\sqrt{\tau_{11}{\tau}_{22}}\kern0.24em \mathrm{y}\;{\tau}_{22}=\frac{\tau_{02}^2}{r_2^2{\tau}_{00}}\end{array}\right.\\ {}\Rightarrow \left({k}_2\hbox{-} 1\right)\left({\tau}_{00}+{\sigma}^2\right)=2{D\tau}_{01}+{D}^2{\tau}_{11}+2{D}^2{\tau}_{02}+2{D}^3{r}_{12}\sqrt{\frac{\tau_{11}{\tau}_{02}^2}{r_2^2{\tau}_{00}}}+\frac{D^4{\tau}_{02}^2}{r_2^2{\tau}_{00}}\\ {}\begin{array}{l}\Rightarrow \left({k}_2\hbox{-} 1\right)\left({\tau}_{00}+{\sigma}^2\right){r}_2^2{\tau}_{00}=2{D\tau}_{01}{r}_2^2{\tau}_{00}+{D}^2{\tau}_{11}{r}_2^2{\tau}_{00}+\\ {}\operatorname{}\operatorname{}\operatorname{}\operatorname{}2{D}^2{\tau}_{02}{r}_2^2{\tau}_{00}+2{D}^3{r}_{12}\sqrt{\frac{\tau_{11}{\tau}_{02}^2}{r_2^2{\tau}_{00}}}{r}_2^2{\tau}_{00}+{D}^4{\tau}_{02}^2\end{array}\\ {}\begin{array}{l}\Rightarrow {D}^4{\tau}_{02}^2+2{D}^2{\tau}_{02}{r}_2^2{\tau}_{00}+2{D}^3{r}_{12}\sqrt{\left({\tau}_{11}/{\tau}_{00}\right)}{r}_2{\tau}_{00}{\tau}_{02}+2{D\tau}_{01}{r}_2^2{\tau}_{00}+\\ {}{D}^2{\tau}_{11}{r}_2^2{\tau}_{00}\hbox{-} \left({\mathrm{k}}_2\hbox{-} 1\right)\left({\tau}_{00}+{\upsigma}^2\right){r}_2^2{\tau}_{00}=0\end{array}\end{array}} $$

τ₀₂ parameter is one of the roots of quadratic function

$$ {\tau}_{02}=\left(-{B}_{02}\pm \sqrt{B_{02}^2-4{A}_{02}{C}_{02}}\right)/2{A}_{02} $$

where

$$ {\displaystyle \begin{array}{l}{A}_{02}={D}^4,{B}_{02}=2{D}^2{r}_2^2\uptau {}_{00}+2{D}^3{r}_{12}\sqrt{\left({\tau}_{11}/{\tau}_{00}\right)}{r}_2{\tau}_{00}\;\mathrm{and}\;{C}_{02}=2D{\uptau}_{01}{r}_2^2\tau {}_{00}+{D}^2{\tau}_{11}{r}_2^2{\tau}_{00}\hbox{-} \\ {}\left({k}_2\hbox{-} 1\right)\left({\tau}_{00}+{\upsigma}^2\right){r}_2^2{\tau}_{00}.\end{array}} $$

Calculation of variance component τ₁₂

$$ {\displaystyle \begin{array}{c}\Rightarrow \left|\begin{array}{l}{k}_2\left({\tau}_{00}+{\upsigma}^2\right)={\tau}_{00}+2{D\tau}_{01}+{D}^2{\tau}_{11}+2{D}^2{\tau}_{02}+2{D}^3{\tau}_{12}+{D}^4{\tau}_{22}+{\upsigma}^2\\ {}{\tau}_{02}={r}_2\sqrt{\tau_{00}{\tau}_{22}}\kern0.24em \mathrm{y}\kern0.24em {\tau}_{22}=\frac{\tau_{12}^2}{r_{12}^2{\tau}_{11}}\end{array}\right.\\ {}\Rightarrow \left({k}_2\hbox{-} 1\right)\left({\tau}_{00}+{\upsigma}^2\right)=2{D\tau}_{01}+{D}^2{\tau}_{11}+2{D}^2{r}_2\sqrt{\tau_{00}{\tau}_{22}}+2{D}^3{\tau}_{12}+\frac{D^4{\tau}_{12}^2}{r_{12}^2{\tau}_{11}}\\ {}\Rightarrow \left({k}_2\hbox{-} 1\right)\left({\tau}_{00}+{\upsigma}^2\right){r}_{12}^2{r}_{11}=2{D\tau}_{01}{r}_{12}^2{\tau}_{11}+{D}^2{\tau}_{11}{r}_{12}^2{\tau}_{11}+\\ {}\begin{array}{l}\kern11em 2{D}^2{r}_2\sqrt{\frac{\tau_{00}{\tau}_{12}^2}{r_{12}^2{\tau}_{11}}}{r}_{12}^2{\tau}_{11}+2{D}^3{\tau}_{12}{r}_{12}^2{\tau}_{11}+{D}^4{\tau}_{12}^2\\ {}\Rightarrow {D}^4{\tau}_{12}^2+2{D}^2{r}_2\sqrt{\left({\tau}_{00}/{\tau}_{11}\right)}{r}_{12}{\tau}_{11}{\tau}_{12}+2{D}^3{\tau}_{12}{r}_{12}^2{\tau}_{11}+2{D\tau}_{01}{r}_{12}^2{\tau}_{11}+\\ {}{D}^2{\tau}_{11}{r}_{12}^2{\tau}_{11}\hbox{-} \left({k}_2\hbox{-} 1\right)\left({\tau}_{00}+{\upsigma}^2\right){r}_{12}^2{\tau}_{11}=0\end{array}\end{array}} $$

τ₁₂ parameter is one of the roots of quadratic function

$$ {\tau}_{12}=\left(-{B}_{12}\pm \sqrt{B{}_{12}{}^2 - 4{A}_{12}{C}_{12}}\right)/2{A}_{12} $$

where

$$ {A}_{12}={D}^4,{B}_{12}=2{D}^2{r}_2\sqrt{\left({\tau}_{00}/{\tau}_{11}\right)}{r}_{12}{\tau}_{11}+2{D}^3{r}_{12}^2{\tau}_{11}\;\mathrm{and}\;{C}_{12}=2D\;{\tau}_{01}{r}_{12}^2{\tau}_{11}+{D}^2{\tau}_{11}{r}_{12}^2{\tau}_{11}- $$

$$ \left({k}_2-1\right)\left({\tau}_{00}+{\sigma}^2\right){r}_{12}^2{\tau}_{11}. $$

Calculation of variance component τ₂₂

$$ {\displaystyle \begin{array}{l}{r}_2=\frac{\tau_{02}}{\sqrt{\tau_{00}{\tau}_{22}}};{r}_{12}=\frac{\tau_{12}}{\tau_{11}{\tau}_{22}};{k}_2=\frac{\tau_{00}+2D{\tau}_{01}+{D}^2{\tau}_{11}+2{D}^2{\tau}_{02}+2{D}^3{\tau}_{12}+{D}^4{\tau}_{22}+{\sigma}^2}{\tau_{00}+{\sigma}^2}\\ {}\Rightarrow \left|\begin{array}{l}{k}_2\left({\tau}_{00}+{\sigma}^2\right)={\tau}_{00}=2D{\tau}_{01}+{D}^2{\tau}_{11}+2{D}^2{\tau}_{02}+2{D}^3{\tau}_{12}+{D}^4{\tau}_{22}+{\sigma}^2\\ {}{\tau}_{02}={r}_2\sqrt{\tau_{00}{\tau}_{22}}\;\mathrm{y}{\tau}_{12}={r}_{12}\sqrt{\tau_{11}{\tau}_{22}}\end{array}\right.\\ {}\Rightarrow \left({k}_2-1\right)\left({\tau}_{00}+{\sigma}^2\right)=2D{\tau}_{01}+{D}^2{\tau}_{11}+2{D}^2{r}_2\sqrt{\tau_{00}{\tau}_{22}}+2{D}^3{r}_{12}\sqrt{\tau_{11}{\tau}_{22}}+{D}^4{\tau}_{22}\\ {}\Rightarrow \left({k}_2-1\right)\left({\tau}_{00}+{\sigma}^2\right)=2D{\tau}_{01}+{D}^2{\tau}_{11}+{D}^4{\tau}_{22}=2{D}^2{r}_2\sqrt{\tau_{00}{\tau}_{22}}+2{D}^3{r}_{12}\sqrt{\tau_{11}{\tau}_{22}}\\ {}\Rightarrow \left(2{D}^2{r}_2\right)\sqrt{\tau_{00}{\tau}_{22}}=2{D}^3{r}_{12}\sqrt{\tau_{11}{\tau}_{22}\Big){}^2}={\left[\left({k}_2-1\right)\left({\tau}_{00}+{\sigma}^2\right)-2D{\tau}_{01}-{D}^2{\tau}_{11}-{D}^4{\tau}_{22}\right]}^2\\ {}\Rightarrow {D}^8{\tau}_{22}^2-{\tau}_{22}{\left(2{D}^2{r}_2\sqrt{\tau_{00}}+2{D}^3{r}_{12}\sqrt{\tau_{11}}\right)}^2+2{D}^6{\tau}_{11}{\tau}_{22}+4{D}^5{\tau}_{01}{\tau}_{22}-\\ {}\kern.1em 2\left({k}_2-1\right)\left({\tau}_{00}+{\sigma}^2\right){D}^4{\tau}_{22}+4{D}^2{\tau}_{01}^2+{D}^4{\tau}_{11}^2+{\left({k}_2-1\right)}^2{\left({\tau}_{00}+{\sigma}^2\right)}^2-\\ {}\kern.1em 2\left({k}_2-1\right)\left({\tau}_{00}+{\sigma}^2\right)\times \left(2D{\tau}_{01}+{D}^2{\tau}_{11}\right)+4{D}^3{\tau}_{01}{\tau}_{11}=0\end{array}} $$

τ₂₂ parameter is one of the roots of quadratic function

$ {\tau}_{22}=\frac{-{B}_{22}\pm \sqrt{B_{22}^2-4{A}_{11}{C}_{22}}}{2{A}_{22}} $

where

$$ {\displaystyle \begin{array}{l}{A}_{22}={D}^8,{B}_{22}=-{\left(2{D}^2{r}_2\sqrt{\tau_{00}}+2{D}^3{r}_{12}\sqrt{\tau_{11}}\right)}^2+2{D}^6{\tau}_{11}+4{D}^5{\tau}_{01}-2{D}^4\left({k}_2-1\right)\left({\tau}_{00}+{\sigma}^2\right)\\ {}\mathrm{and}\;{\mathrm{C}}_{22}=4{D}^2{\tau}_{01}^2+{D}^4{\tau}_{11}^2+{\left({k}_2-1\right)}^2{\left({\tau}_{00}+{\sigma}^2\right)}^2-2\left({k}_2-1\right)\left({\tau}_{00}+{\sigma}^2\right)\times \left(2D{\tau}_{01}+{D}^2{\tau}_{11}\right)+\\ {}4{D}^3{\tau}_{01}{\tau}_{11}.\end{array}} $$

Appendix 3

To extend the proposed method to addressing sample size calculations for future data, the following piecewise growth model for two separate linear pieces will be used:

Level 1:

$$ {Y}_{it}={b}_{0i}+{b}_{1i}{X}_{1 it}+{b}_{2i}{X}_{2 it}+{e}_{it} $$

Level 2:

$$ {\displaystyle \begin{array}{l}{b}_{0\mathrm{i}}={\beta}_{00}+{u}_{0\mathrm{i}},\\ {}{b}_{1\mathrm{i}}={\beta}_{10}+{\beta}_{11}{W}_{\mathrm{i}}+{u}_{1\mathrm{i}},\\ {}{b}_{2\mathrm{i}}={\beta}_{20}+{\beta}_{21}{W}_{\mathrm{i}}+{u}_{2\mathrm{i}},\end{array}} $$

Note that because of the randomization of subjects to the two treatment groups, Level 2 for the intercept does not contain the value of the group-level variable W_i, and we assume a common mean response at baseline. Substituting the corresponding Level 2 equations into the Level 1 equation, we get the combined model:

$$ {Y}_{\mathrm{i}\mathrm{t}}={\beta}_{00}+\left({\beta}_{10}+{\beta}_{11}{W}_{\mathrm{i}}\right){X}_{1\mathrm{i}\mathrm{t}}+\left({\beta}_{20}+{\beta}_{21}{W}_{\mathrm{i}}\right){X}_{2\mathrm{i}\mathrm{t}}+\left({u}_{0\mathrm{i}}+{u}_{1\mathrm{i}}{X}_{1\mathrm{ti}}+{u}_{2\mathrm{i}}{X}_{2\mathrm{ti}}+{e}_{\mathrm{i}\mathrm{t}}\right), $$

where X_1it and X_2it are coded variables to represent the piecewise regression. In this case, X_1it denotes the time of the tth measurement on the ith subject, while the variable X_2it would be coded as X_2it = X_it if (X_it − Bp) > 0 and X_2it = 0 if (X_it − Bp) ≤ 0.

From this single equation model, the expected value and variance–covariance structures of Y_it given W_i can be expressed as

$$ {\displaystyle \begin{array}{l}E\;\left({Y}_{it}|{W}_i\right)={\beta}_{00}+\left({\beta}_{10}+{\beta}_{11}{W}_i\right){X}_{1 it}+\left({\beta}_{20}+{\beta}_{21}{W}_{\mathrm{i}}\right){X}_{2 it},\\ {} Var\;\left({Y}_{it}|{W}_i\right)={\tau}_{00}+2{X}_{1\mathrm{i}\mathrm{t}}{\tau}_{01}+{X}_{1 it}^2{\tau}_{11}+2{X}_{2\mathrm{i}\mathrm{t}}{\tau}_{02}+2{X}_{1\mathrm{i}\mathrm{t}}{X}_{2\mathrm{i}\mathrm{t}}{\tau}_{12}+{X}_{2\mathrm{i}\mathrm{t}}^2{\tau}_{22}+{\sigma}^2,\\ {}\begin{array}{l} Cov\;\left({Y}_{it},{Y}_{it},|{W}_i\right)={\tau}_{00}+\left({X}_{1\mathrm{i}\mathrm{t}}+{X}_{1\mathrm{i}{\mathrm{t}}^{\prime }}\right){\tau}_{01}+{X}_{1\mathrm{i}\mathrm{t}}{X}_{1\mathrm{i}{\mathrm{t}}^{\prime }}{\tau}_{11}+\\ {}\operatorname{}\operatorname{}\operatorname{}\kern0.96em \left({X}_{2\mathrm{i}\mathrm{t}}+{X}_{2\mathrm{i}{\mathrm{t}}^{\prime }}\right){\tau}_{02}+\left({X}_{1\mathrm{i}\mathrm{t}}{X}_{2\mathrm{i}{\mathrm{t}}^{\prime }}+{X}_{2\mathrm{i}\mathrm{t}}{X}_{1\mathrm{i}{\mathrm{t}}^{\prime }}\right){\tau}_{12}+\left({X}_{2\mathrm{i}\mathrm{t}}{X}_{2\mathrm{i}{\mathrm{t}}^{\prime }}\right){\tau}_{22}.\end{array}\end{array}} $$

To generalize the proposed procedure to more complicated piecewise models, it is fundamental to use transformed indices that are easy to specify. In addition to those specified in Eqs. 13–16 and 19 (i. e., ρ₁, d_L, r₁, k₁ and β₁₁), this new situation requires the indices to be defined as follows:

$$ {\displaystyle \begin{array}{l}{d}_{PW}=\frac{D_1{\upbeta}_{11}+{D}_2^2{\upbeta}_{21}}{\sqrt{\tau_{00}+2{D}_1{\tau}_{01}+{D}_1^2{\tau}_{11}+2{D}_2{\tau}_{02}+2{D}_1{D}_2{\tau}_{12}+{D}_2^2{\tau}_{22}+{\sigma}^2}},\\ {}{r}_2=\frac{Cov\kern0.28em \left({u}_{0\mathrm{i}},{u}_{2\mathrm{i}}\right)}{\sqrt{Var\kern0.28em \left({u}_{0\mathrm{i}}\kern0.28em {u}_{2\mathrm{i}}\right)}}=\frac{\tau_{02}}{\sqrt{\tau_{00}\kern0.28em {\tau}_{22}}},\\ {}{r}_{12}=\frac{Cov\kern0.28em \left({\mathrm{u}}_{0\mathrm{i}},{\mathrm{u}}_{2\mathrm{i}}\right)}{\sqrt{Var\kern0.28em \left({\mathrm{u}}_{0\mathrm{i}}\kern0.28em {\mathrm{u}}_{2\mathrm{i}}\right)}}=\frac{\tau_{12}}{\sqrt{\tau_{11}\kern0.28em {\tau}_{22}}},\end{array}} $$

and

$$ {k}_2=\frac{Var\kern0.28em \left({Y}_{iT}\right)}{Var\kern0.28em \left({Y}_{i1}\right)}=\frac{\tau_{00}+2{D}_1{\tau}_{01}+{D}_1^2{\tau}_{11}+2{D}_2{\tau}_{02}+2{D}_1{D}_2{\tau}_{12}+{D}_2^2{\tau}_{22}+{\sigma}^2}{\tau_{00}+{\sigma}^2}. $$

By solving simultaneous equations following a procedure similar to that described in Appendix 2, we obtain the components of variance τ₀₂, τ₁₂and τ₂₂ associated with the second of the two piecewise slopes of the linear growth model. Specifically,

$$ {\tau}_{02}=\frac{-{\mathrm{B}}_{02}\pm \sqrt{\mathrm{B}{}_{02}{}^2 - 4{\mathrm{A}}_{02}{\mathrm{C}}_{02}}}{2{\mathrm{A}}_{02}}, $$

where $ {\displaystyle \begin{array}{l}{A}_{02}={D}_2^2;{B}_{02}=2{D}_2{r}_2^2{\tau}_{00}+2{D}_1{D}_2{r}_{12}\sqrt{\left({\tau}_{11}/{\tau}_{00}\right)}{r}_2{\tau}_{00};{C}_{02}=2{D}_1{\tau}_{01}{r}_2^2{\tau}_{00}+{D}_1^2{\tau}_{11}\\ {}{r}_2^2{\tau}_{00}-\left({k}_2-1\right)\left({\tau}_{00}+{\sigma}^2\right){r}_2^2{\tau}_{00};\end{array}} $

$$ {\tau}_{12}=\frac{-{B}_{12}\pm \sqrt{B{}_{12}{}^2 - 4{A}_{12}{C}_{12}}}{2{A}_{12}}, $$

where $ {A}_{12}={D}_2^2;{B}_{12}=2{D}_2{r}_2\sqrt{\left({\tau}_{00}/{\tau}_{11}\right)}{r}_{12}{\tau}_{11}+2{D}_1{D}_2{r}_{12}^2{\tau}_{11};{C}_{12}=2D{\tau}_{01}{r}_{12}^2\tau {}_{11}+{D}^2{\tau}_{11} $$ {r}_{12}^2{\tau}_{11}-\left({k}_2-1\right)\left({\tau}_{00}+{\sigma}^2\right){r}_{12}^2{\tau}_{11}; $ and

$$ {\tau}_{22}=\frac{-{B}_{22}\pm \sqrt{B{}_{22}{}^2 - 4{A}_{22}{C}_{22}}}{2{A}_{22}}, $$

$ \mathrm{where}\;{A}_{22}={D}_2^4;{B}_{22}=-{\left(2{D}_2{r}_2\sqrt{\tau_{00}}+2{D}_1{D}_2{r}_{12}\sqrt{\tau_{11}}\right)}^2+2{D}_1^2{D}_2^2{\tau}_{11}+4{D}_1{D}_2^2{\tau}_{01}-2{D}_2^2 $$ {\displaystyle \begin{array}{l}\left({k}_2-1\right)\left({\tau}_{00}+{\sigma}^2\right);{C}_{22}=4{D}_1^2{\tau}_{01}^2+{D}_1^4{\tau}_{11}^2+{\left({k}_2-1\right)}^2{\left({\tau}_{00}+{\sigma}^2\right)}^2+4{D}_1^3{\tau}_{01}{\tau}_{11}-2\left({k}_2-1\right)\left({\tau}_{00}+{\sigma}^2\right)\\ {}\left(2{D}_1{\tau}_{01}+{D}_1^2{\tau}_{11}\right).\end{array}} $

Finally, by substituting in the effect size formula (i. e., d_PW) the value found for Var (Y_iT) in the formula used to define the ratio between the variances of outcomes at the beginning and end of the study (i. e., k₂), the coefficient associated with differences between treatment conditions from the breakpoint to the end of the study can be written as

$$ {\upbeta}_{21}=\frac{d_{PW}\sqrt{k_2\left({\uptau}_{00}+{\sigma}^2\right)}-{d}_L\sqrt{k_1\left({\uptau}_{00}+{\sigma}^2\right)}}{D_2}. $$

The components of variance associated with the first of the two piecewise slopes (i. e., τ₀₁and τ₁₁) coincide with those obtained for the linear growth model with D replaced by D₁.

Appendix 4

OLS estimation of the effect of the interaction of treatment by time with three groups:

$$ {\hat{\upbeta}}_{11}=\left\{\begin{array}{l}\left[\frac{\sum \limits_{i=1}^{N_2}\sum \limits_{t=1}^T\left({X}_{it}-{\overline{X}}_i\right){Y}_{it}}{\sum \limits_{i=1}^{N_2}\sum \limits_{t=1}^T{\left({X}_{it}-{\overline{X}}_i\right)}^2}-\frac{\sum \limits_{i=1}^{N_1}\sum \limits_{t=1}^T\left({X}_{it}-{\overline{X}}_i\right){Y}_{it}}{\sum \limits_{i=1}^{N_1}\sum \limits_{t=1}^T{\left({X}_{it}-{\overline{X}}_i\right)}^2}\right]+\\ {}\operatorname{}\operatorname{}\operatorname{}\left[\frac{\sum \limits_{i=1}^{N_3}\sum \limits_{t=1}^T\left({X}_{it}-{\overline{X}}_i\right){Y}_{it}}{\sum \limits_{i=1}^{N_3}\sum \limits_{t=1}^{\mathrm{T}}{\left({X}_{it}-{\overline{X}}_i\right)}^2}-\frac{\sum \limits_{i=1}^{N_1}\sum \limits_{t=1}^T\left({X}_{it}-{\overline{X}}_i\right){Y}_{\mathrm{it}}}{\sum \limits_{i=1}^{N_1}\sum \limits_{t=1}^T{\left({X}_{it}-{\overline{X}}_i\right)}^2}\right]+\\ {}\operatorname{}\operatorname{}\operatorname{}\operatorname{}\operatorname{}\left[\frac{\sum \limits_{i=1}^{N_3}\sum \limits_{t=1}^T\left({X}_{it}-{\overline{X}}_i\right){Y}_{it}}{\sum \limits_{i=1}^{N_3}\sum \limits_{t=1}^T{\left({X}_{it}-{\overline{X}}_i\right)}^2}-\frac{\sum \limits_{i=1}^{N_2}\sum \limits_{t=1}^T\left({X}_{it}-{\overline{X}}_i\right){\mathrm{Y}}_{\mathrm{it}}}{\sum \limits_{i=1}^{N_2}\sum \limits_{t=1}^T{\left({X}_{it}-{\overline{X}}_i\right)}^2}\right]\end{array}\right\}/4 $$

With four groups we would proceed similarly, but dividing instead by 10. With five groups, we should divide by 20.

Appendix 5

If we assume that the subject outcomes are independent, the variance of the numerator of Eq. 28 can be decomposed as follows:

$$ \mathrm{Var}\;\left(\sum \limits_{i=1}^{N_j}\sum \limits_{t=1}^T\left({X}_{it}-{\overline{X}}_i\right){Y}_{it}\right)=\mathrm{A}+\mathrm{B}. $$

Developing the terms of the second member of the above expression, we obtain

$$ {\displaystyle \begin{array}{l}\begin{array}{l}\mathrm{A}=\sum \limits_{i=1}^{N_j}\sum \limits_{t=1}^T{\left({X}_{it}-{\overline{X}}_i\right)}^2 Var\;\left({Y}_{it}\right)=\sum \limits_{i=1}^{N_j}\sum \limits_{t=1}^T{\left({X}_{it}-{\overline{X}}_i\right)}^2\left({\tau}_{00}+{2}_{it}{\tau}_{01}+{X}_{it}^2{\tau}_{11}+{\sigma}^2\right)\\ {}=\sum \limits_{i=1}^{N_j}\sum \limits_{t=1}^T\left[{\left({X}_{it}-{\overline{X}}_i\right)}^2{\tau}_{00}+{\left({X}_{it}-{\overline{X}}_i\right)}^22{X}_{it}{\tau}_{01}+{\left({X}_{it}-{\overline{X}}_i\right)}^2{X}_{it}^2{\tau}_{11}+{\left({X}_{it}-{\overline{X}}_i\right)}^2{\sigma}^2\right]\\ {}=\left({\tau}_{00}+{\sigma}^2\right){N}_j\sum \limits_{t=1}^T{\left({X}_{it}-{\overline{X}}_i\right)}^2+2{\tau}_{01}{N}_j\sum \limits_{t=1}^T{X}_{it}{\left({X}_{it}-{\overline{X}}_i\right)}^2+{\tau}_{11}{N}_j\sum \limits_{t=1}^T{X}_{it}^2{\left({X}_{it}-{\overline{X}}_{\mathrm{i}}\right)}^2\\ {}\mathrm{B}=\sum \limits_{i=1}^{N_j}\sum \limits_{t=1}^T\sum \limits_{t^{\prime}\ne \mathrm{t}}^T\left({X}_{it}-{\overline{X}}_i\right)\left({X}_{i{t}^{\prime }}-{\overline{X}}_i\right) Cov\left({Y}_{it},{Y}_{it},\right)=\sum \limits_{i=1}^{N_j}\sum \limits_{t=1}^T\sum \limits_{t^{\prime}\ne \mathrm{t}}^T{Z}_t{Z}_t,\left[{\tau}_{00}+\left({\mathrm{X}}_{it}+{\mathrm{X}}_{it},\right){\tau}_{01}+{\mathrm{X}}_{it}{\mathrm{X}}_{it},{\uptau}_{11}\right]\end{array},\\ {}={\tau}_{11}{N}_j\sum \limits_{t=1}^T\sum \limits_{t^{\prime}\ne t}^T\left({X}_{it}-{\overline{X}}_i\right)\left({X}_{it},-\overline{X_i}\right){X}_{it}{X}_{it},-{\tau}_{00}{N}_j\sum \limits_{t=1}^T{\left({X}_{it},-\overline{X_i}\right)}^2-2{\tau}_{01}{N}_j\sum \limits_{t=1}^T{X}_{it}{\left({X}_{it}-\overline{X_i}\right)}^2\\ {}\kern0.6em \\ {}\kern0.72em \end{array}} $$

where it follows that:

$$ {\displaystyle \begin{array}{l}\mathrm{Var}\left(\sum \limits_{i=1t}^{N_j}\sum \limits_{t=1}^T\left({X}_{it}-\overline{X_i}\right){Y}_{it}\right)=\left({\tau}_{00}+{\sigma}^2={\tau}_{00}\right){N}_j\sum \limits_{t=1}^T{\left({X}_{it}-{\overline{X}}_i\right)}^2+2{\tau}_{01}{N}_j\sum \limits_{t=1}^T{X}_{it}{\left({X}_{it}-\overline{X_i}\right)}^2-\\ {}\kern3em 2{\tau}_{01}{N}_j\sum \limits_{t=1}^T{X}_{it}{\left({X}_{it}-{\overline{X}}_i\right)}^2+{\tau}_{11}{N}_j\left[\sum \limits_{t=1}^T{X}_{it}^2{\left({X}_{it}-{\overline{X}}_i\right)}^2+\sum \limits_{t=1}^T\sum \limits_{t^{\prime}\ne t}^T\left({X}_{it}-{\overline{X}}_i\right)\left({X}_{it},-{\overline{X}}_i\right){X}_{it}{X}_{it},\right]\\ {}={\sigma}^2{N}_j\sum \limits_{t=1}^T{\left({X}_{it}-{\overline{X}}_i\right)}^2+{\tau}_{11}{N}_j\left[\sum \limits_{t=1}^T{X}_{it}^2{\left({X}_{it}-{\overline{X}}_i\right)}^2+\sum \limits_{t=1}^T\sum \limits_{t^{\prime}\ne \mathrm{t}}^T\left({X}_{it}-{\overline{X}}_i\right)\left({X}_{it},-{\overline{X}}_i\right){X}_{it}{X}_{it},\right]\\ {}={\sigma}^2{N}_j\sum \limits_{t=1}^T{\left({X}_{it}-{\overline{X}}_i\right)}^2+{\tau}_{11}{N}_j\sum \limits_t^T\sum \limits_{t^{\prime}}^T\left({X}_{it}-{\overline{X}}_i\right)\left({X}_{it},-{\overline{X}}_i\right){X}_{it}{X}_{it},\\ {}={\sigma}^2{N}_j\sum \limits_{t=1}^T{\left({X}_{it}-{\overline{X}}_i\right)}^2+{\tau}_{11}{N}_j\sum \limits_{t=1}^T{\left({X}_{it}-{\overline{X}}_i\right)}^4\end{array}} $$

Therefore, the variance of the estimator of the linear slope for jth group, $ {\hat{\delta}}_{\mathrm{j}}\left(\mathrm{j}=0,1\right) $, is

$$ {\displaystyle \begin{array}{c}\mathrm{Var}\;\left({\hat{\updelta}}_j\right)=\frac{\mathrm{Var}\;\left(\sum \limits_{i=1}^{N_{\mathrm{j}}}\sum \limits_{t=1}^T\left({X}_{it}-{\overline{X}}_i\right){Y}_{it}\right)}{\mathrm{Var}\left(\sum \limits_{i=1}^{N_j}\sum \limits_{t=1}^T{\left({X}_{it}-{\overline{X}}_i\right)}^2\right)}=\frac{\sigma^2{N}_j\sum \limits_{t=1}^T{\left({X}_{it}-{\overline{X}}_i\right)}^2}{N_j^2\sum \limits_{t=1}^T{\left({X}_{it}-{\overline{X}}_i\right)}^4}+\frac{\uptau_{11}{N}_j\sum \limits_{t=1}^T{\left({X}_{it}-{\overline{X}}_i\right)}^4}{N_j^2\sum \limits_{t=1}^T{\left({X}_{it}-{\overline{X}}_i\right)}^4}\\ {}\kern0.48em =\frac{\sigma^2}{N_j\sum \limits_{t=1}^T{\left({X}_{it}-{\overline{X}}_i\right)}^2}+\frac{\uptau_{11}}{N_j}=\frac{1}{N_j}\left(\frac{\sigma^2}{\sum \limits_{t=1}^T{\left({X}_{it}-{\overline{X}}_i\right)}^2}+{\uptau}_{11}\right)\end{array}} $$

Finally, using the properties of variance when variables are independent, we arrive at

$$ \mathrm{Var}\;\left({\hat{\upbeta}}_{11}\right)=\mathrm{Var}\;\left({\hat{\delta}}_1\right)+ Var\;\left({\hat{\delta}}_0\right)=\frac{4}{N}\left(\frac{\sigma^2}{\sum \limits_{t=1}^T{\left({X}_{\mathrm{i}\mathrm{t}}-{\overline{X}}_{\mathrm{i}}\right)}^2}+{\tau}_{11}\right) $$

Remember that N denotes here the total number of Level 2 units included in study, with N/2 subjects in each group. The quantity 4/N on the right side of variance formula should be replaced by (1/Np₁p₂) in order to allow for groups of unequal size, where p₁ =N_C /N and p₂ =N_E /N″.

Appendix 6

Equating Eqs. 36 and 37 and solving for N, we get

$$ {\displaystyle \begin{array}{l}{Z}_{1-\alpha /2}\sqrt{\sigma_b^2/{Np}_1{p}_2}={\beta}_{11}-{Z}_{1-\beta}\sqrt{\sigma_b^2/{Np}_1{p}_2}\\ {}{\beta}_{11}={Z}_{1-\alpha /2}\sqrt{\sigma_b^2/{Np}_1{p}_2}+{Z}_{1-\beta}\sqrt{\sigma_b^2/{Np}_1{p}_2}\\ {}{\beta}_{11}=\left({Z}_{1-\alpha /2}+{Z}_{1-\beta}\right)\sqrt{\sigma_b^2/{Np}_1{p}_{22}}\\ {}{\beta}_{11}^2{Np}_1{p}_2={\left({Z}_{1-\alpha /2}+{Z}_{1-\beta}\right)}^2{\sigma}_b^2\\ {}N=\frac{{\left({Z}_{1-\alpha /2}+{Z}_{1-\beta}\right)}^2{\sigma}_b^2}{\beta_{11}^2{p}_1{p}_2}\end{array}} $$

Appendix 7

Rights and permissions

Reprints and permissions

About this article

Cite this article

Vallejo, G., Ato, M., Fernández, M.P. et al. Sample size estimation for heterogeneous growth curve models with attrition. Behav Res 51, 1216–1243 (2019). https://doi.org/10.3758/s13428-018-1059-y

Download citation

Published: 22 June 2018
Issue Date: 15 June 2019
DOI: https://doi.org/10.3758/s13428-018-1059-y

Keywords

Use our pre-submission checklist

Avoid common mistakes on your manuscript.

Sample size estimation for heterogeneous growth curve models with attrition

Abstract

Similar content being viewed by others

Search for efficient complete and planned missing data designs for analysis of change

What to Do When Only a Baseline Measurement Is Available

Single time point comparisons in longitudinal randomized controlled trials: power and bias in the presence of missing data

Formulation of a statistical model

Estimation of the treatment effect and its variance

Statistical power analysis