The powerBBK package is based on the following treatment effect regression model
$$\begin{aligned} y_{it}^* = \beta _0 + d_{it}\beta _{1,t} + \mu _i + \epsilon _{it}, \end{aligned}$$
(1)
where \(y_{it}^*\) denotes the latent outcome variable of subject i at period t, \(\mathbf {d}_{i}=[d_{i1},\ldots ,d_{iT},]\) is a vector of time-varying treatment variables, where \(d_{it}=1\) when subject i receives treatment at period t and 0 otherwise. The parameters of interest are \(\varvec{\beta }_1 = [\beta _{1,1},\ldots ,\beta _{1,T}]'\). This specification nests as a special case a time-invariant treatment effect model (where all \(\beta _{1,t}\) are identical). Treatment variables \(\mathbf {d}_{i}\) are allowed to be either dichotomous or continuous. Time-invariant unobserved heterogeneity is captured by \(\mu _i\) with corresponding cumulative distribution function \(F_\mu\). The remaining errors \(\epsilon _{it}\) are drawn from a cumulative distribution \(F_{\epsilon |\mathbf {d}}(a)\). We allow the errors to be heteroscedastic: the variance of the errors \(\epsilon _{it}\) can depend on treatment conditions \(\mathbf {d}_{i}\). We denote by \(\sigma ^2_{\epsilon ,\mathbf {d}}\) the variance of \(\epsilon _{it}\) conditional on treatment. A between-subjects (hereafter BS) design implies that \(\{d_{it}:t=1,\ldots ,T\}\) does not vary across t. For the case of binary BS treatment, a subject is either assigned only to the control condition (\(d_{it}=0\) for all t) or to the treatment condition (\(d_{it}=1\) for all t). The continuous BS treatment assigns subjects randomly to a treatment drawn from the researcher specified set of treatment variables. In the presence of homoscedastic errors \(\epsilon _{it}\), the noise level \(\mu _i + \epsilon _{it}\) is the same for treatment and control conditions. In this case it is reasonable to implement a BS design by assigning an equal number of subjects to control and treatment conditions. In the presence of heteroscedastic errors \(\epsilon _{it}\), statistical power can possibly be improved by assigning more subjects to the conditions where the noise level is higher. A within-subjects (hereafter WS) design implies that \(\{d_{it}:t=1,\ldots ,T\}\) varies across t for each subject. In the presence of homoscedastic errors \(\epsilon _{it}\), it is reasonable to use a balanced WS design with \(d_{it}=0\) for T / 2 periods as long as the expected cost of a subject is approximately the same under both treatment conditions. In the presence of heteroscedastic errors \(\epsilon _{it}\), statistical power may be improved by assigning subjects to the noisier conditions for a higher number of periods. Finally, we maintain the assumption that \(\mu _i\) is independent of all \(d_{it}\). This assumption is typically motivated by the randomization of subjects to treatment conditions.
The powerBBK package considers three leading data-generating processes.
-
Case 1. Linear model: \(y_{it} = y_{it}^*.\)
-
Case 2. Binary choice model: \(y_{it} = 1 \text { if } y_{it}^* \ge 0\), and 0 otherwise.
-
Case 3. Model with censoring from below at a: \(y_{it} = \max (a,y_{it}^*),\)
where the observable outcome variable \(y_{it}\) may differ from \(y_{it}^*\) according to the case considered. With this parameterization we can generate samples for different sequences \(\{d_{it}:t=1,\ldots ,T\}\) given values of \((\beta _0,\varvec{\beta }_1)\) and \((F_\mu , F_{\epsilon |\mathbf {d}})\). Identification of \((\beta _0,\varvec{\beta }_1)\) requires some minimal restrictions on the functions \((F_\mu , F_{\epsilon |\mathbf {d}})\). Mean independence with the treatment indicator is sufficient for the linear model (Case 1). Independence between \(\epsilon _{it}\) is typically assumed for Cases 2 and 3. Note that Cases 1 and 3 allow the variance of \(\epsilon _{it}\) to differ between control and treatment conditions. The user can specify any distribution available in STATA for \(F_{\epsilon |\mathbf {d}}\) for Case 1. The package implements Case 2 as either a probit or logit model, thus setting \(F_\epsilon\) to the standard normal or logistic distribution, respectively. The package implements Case 3 by setting \(F_{\epsilon |\mathbf {d}}\) to a mean zero normal distribution with variance \(\sigma ^2_{\epsilon ,\mathbf {d}}\), the familiar tobit model. The distribution \(F_\mu\) is always assumed to be the normal distribution with a user-specified standard deviation, as most panel data models rely on this assumption in the estimation procedure.
The data-generating process described above is relatively flexible in terms of the type of outcome distributions it can capture. This is especially true for Case 1. The package currently does not support other discrete outcomes, notably multinomial choices or ordered responses. The powerBBK is free and open-source, allowing users to extend the package to suit their needs.
The powerBBK package requires the user to specify details concerning the experimental design, such as the number of subjects, number of periods, WS or BS design, balance of WS design and so on. There are options to evaluate the statistical power over a range of values N and to assess simultaneously power of both WS and BS designs. The user can specify whether or not to include individual heterogeneity by means of random-effects terms (i.e., the variance of \(\mu _i\) is greater than 0) or to include treatment-specific heteroscedasticity (i.e., the variance of \(\epsilon _{it}\) depends on the treatment received). Users can also specify when appropriate (e.g., in linear models) the distribution of errors \((F_\mu , F_{\epsilon |\mathbf {d}})\) they require for their simulations, thus allowing for example heavy-tailed distributions in linear models. The package further permits users to simulate power of nonparametric rank-based tests and can accommodate several common non-linear models (i.e., logit, probit, tobit).Footnote 3 Users can use the package to predict the maximal power a design can reach given a user-specified budget constraint with treatment-specific costs. Additional information and examples are available in the help file provided with the package.
Computing power of a given design is straightforward using the following steps.
Step 1 Fix N and T and for a given design (WS or BS), values of \((\beta _0,\varvec{\beta }_1)\) and choice of \((F_\mu , F_{\epsilon |\mathbf {d}})\) generate a sample \(\{\{(y_{it},d_{it}):t=1,\ldots ,T\}:i=1,\ldots ,N\}\).
Step 2—parametric Estimate \((\beta _0,\varvec{\beta }_1)\) and the parameters of \((F_\mu , F_{\epsilon |\mathbf {d}})\) and compute \(\hat{z}_t = \hat{\beta }_{1,t}/se(\hat{\beta }_{1,t})\) and the corresponding p value of the null hypothesis \(H_0:\beta _{1,t}=0\) against either a one-sided or two-sided alternative. Here \(se(\hat{\beta }_{1,t})\) denotes the standard error of the estimated period t treatment effect.Footnote 4
Step 2—nonparametric Aggregate the individual data over T and use nonparametric rank-based tests (e.g., Wilcoxon rank-sum test for BS data, Wilcoxon signed-rank test for WS data) of the null hypothesis that the distribution of the aggregated values of y are the same under control and treatment conditions and compute the p value of the test.
Step 3 Repeat steps 1 and 2 for a large number of samples. Compute the fraction of p values which are less than the significance level of the test (e.g., 5 %). This represents the power of the test.
Repeating the three steps above for a range of N and T values for each design, enables the researcher to plot power curves for each element of \(\varvec{\beta }_1\). Power curves are useful for comparing the designs for a given sample size, for determining the minimal sample size needed to reach a certain statistical power separately for each design, or to look at the effect of the number of periods and how to balance the number of participants in the treatments. The package also offers users the possibility of predicting the maximal power an experimental design can reach given a specified budget constraint. In this case, users are required to additionally specify the expected payoff of a participant in each treatment as well as the total available budget. The package then evaluates the power of a series of user-specified allocations, which easily allows users to determine the allocation that maximizes power. Finally, an issue concerning WS designs is possible treatment order effects. These effects imply that the response depends on whether treatment or control conditions are experienced first. The powerBBK package can be used to predict the probability of detecting a user-specified treatment order effect for a given experimental design. This option is currently only implemented for the time-invariant binary treatment effect model where all elements of \(\varvec{\beta }_1\) are identical.