Basing the simulations on a preceding well-powered design is possibly the most desirable solution, since we can utilize a (G)LMM fitted on real and independent empirical data. This provides us with parameter estimates for fixed and random effects, as well as estimates for the coefficients of possible covariates, which eliminates guesswork and possibly biased assumptions. Here we will discuss the key steps as well as theoretical background for the problem-space of this scenario, whereas a step-by-step procedure can be found in Notebook 1. Our example focuses on a LMM with crossed random factors, but we also provide a notebook that demonstrates how to conduct a power analysis for a GLMM with nested random effects.
To more closely mimic real-world analysis demands, we intentionally demonstrate power estimation using a rather complex data set. We will work with data from a study published by Yan, Zhou, Shu, Yusupu, Miao, Kruegel, and Kliegl (2014) examining eye movements during reading. Yan et al. (2014) tested 48 subjects, each of whom read 120 sentences. During reading, gaze moves between different positions in the text to acquire all relevant information. Various factors can influence where a reader moves their eyes next. Amongst other questions, the authors investigated the effects of word length, word frequency, and morphological complexity (i.e. number of suffixes) on saccade’s first landing positions (FLP) during reading (i.e., the position in a word your eyes first land on). Suppose the goal is to conduct a study replicating and further investigating the effect of morphological complexity and word length on the saccades first landing position. In line with the results of Yan et al. (2014), we expect the FLP to increase (i.e. the eyes first land on a position further away from the start of the word) with increasing word length. However, we expect that morphological complexity interacts with this word length effect, such that more suffixes result in a FLP shift towards the beginning of the word. Here, we would need to conduct a power analysis in order to inform the sample size of the follow-up study.
First, we need the appropriate model fitted with lme4 and the data from Yan et al. (2014) available to us (Fig. 2). Note that all scenarios work under the assumption that an optimal model is selected prior to the power analysis (see e.g. Matuschek et al., 2017, for information on how power and model complexity interact).
To proceed, we use the preprocessed data frame of Yan et al. (2014) in which both continuous predictors are already centered.
The FLPmodel includes word length (β = 1.511) and morphological complexity (β = −0.075) as well as their interaction (β = 0.116) as fixed effects (see Table 1). Moreover, we included by-subject and by-item intercepts for the random variables subjects and sentence, making this model a typical example with crossed random effects as described by Baayen et al. (2008).
Table 1 Summary of the FLP model Generally, and with some amount of experience, it is possible to implement simulations from scratch. However, several premade software packages are available to simplify and speed up this process (e.g. simglm (LeBeau, 2019), pamm (Martin, 2012), powerlmm (Magnusson, 2018)). In this tutorial, we will focus on the two complementary packages mixedpower (Kumle, Võ, & Draschkow, 2018) and simr (Green & MacLeod, 2016), as they allow for power simulations for a wide range of (G)LMMs with different fixed- and random-effect structures.
Mixedpower
As a start, mixedpower will be used to estimate power for the planned study (for a detailed introduction to all functions included in the package see the documentation). It allows for the estimation of power for all specified fixed effects and their interactions simultaneously and is comparatively time-efficient due to the parallelized nature of its computational architecture. While simulation-based power solutions for more complex models are still rather time-consuming, mixedpower is an efficient solution when power for multiple effects and parameter specifications is of interest—especially for large and complex data sets. We use mixedpower here since it is designed to be of didactic value and support intuitive understanding of simulation-based power estimation in general. For the sake of completeness, Notebook 1 additionally includes examples using the extremely flexible simr package (Green & MacLeod, 2016).
To determine the sample size for a prospective study, estimating power over a range of different sample sizes is highly informative. Mixedpower provides the eponymous mixedpower() -function which can be used to simulate power for one random variable (e.g. participants)—that is, the factor which is randomly sampled from the population we wish to generalize our results to. The simulation process inside mixedpower() closely follows the steps introduced in Fig. 1, with the first step consisting of simulating data sets. To achieve this, mixedpower() requires various pieces of information about the simulation process.
First, in addition to specifying the model and data, all fixed effects included in the model need to be stated explicitly. Mixedpower then uses the data entered and the structure captured by the fitted model to simulate new data using the simulate.merMod()-function in the lme4 package (version > 1.1-6; Bates, Mächler, et al., 2015b). More specifically, simulate.merMod() is used to generate new values for the dependent variable from the provided, fitted model. Here, simulated values are sampled based on the distribution corresponding to the link function in the provided model (i.e. Gaussian distribution for LMMs or distributions corresponding to the “family” in GLMMs, e.g. “binomial”)—that is, the simulation process assumes that the dependent variable is following the distribution expected by the model type. Accordingly, the simulation of new values will be less appropriate if distributional assumptions are not met by the initial, fitted model. It is thus critical that the optimal model is selected prior to the power analysis.
Next, it is necessary to indicate which random variable should vary in the simulation (e.g. simvar = “subject”), which in this example implies that data sets with a range of different sample sizes are simulated in the power analysis procedure. Mixedpower() then creates a new data set containing simulated response values and the requested number of observations. Therefore, we will enter plausible sample sizes that we wish to estimate power for (e.g. steps of 20, 30, 40, 50, and 60). Subsequently, the simulated data are used to refit the model entered into the simulation and to perform an inferential significance evaluation (Fig. 1, step 2). The final parameter in need of specification in this simulation-based power framework, therefore, is the significance threshold. In general, increasing this threshold will lead to lower estimated power as it becomes harder to reach it, and vice versa. Mixedpower relies on lme4, which does not provide p-values. Even though there are methods available to compute p-values in mixed models, they come with ambiguity, because degrees of freedom in (G)LMMs are hard to determine (Baayen et al., 2008; Luke, 2017). Mixedpower therefore works with the available t-values for LMMs or z-values for GLMMs. All coefficients exceeding the selected t or z threshold value will be counted as significant. As it is plausible to have different criterions for different fixed effects (e.g. depending on whether the inclusion of an effect is of confirmatory or exploratory nature), mixedpower allows for the specification of different criterions for every effect as well as one criterion applied to all specified effects. For our use case we want to apply the same threshold to all specified effects, and thus will enter a t-value of 2 (critical_value = 2) into the simulation, as this will reflect an alpha level of 5% (Baayen et al., 2008). Additional details about the inner workings of mixedpower can be found in the documentation.
Figure 3 visualizes the outcome of the power analysis, with power increasing as sample size increases for one of the comparisons and the interaction. However, no changes in the effect of word length can be observed due to its large effect size. Since we used the exact coefficients (i.e. effect size) found in the empirical data, the corresponding results are data-based. Data-based estimations use the beta coefficients found in the empirical data, while SESOI (i.e. smallest effect of interest) estimations are based on adjusted effect sizes introduced below.
Smallest effect size of interest
So far, all results rely entirely on the exact effect sizes found in the empirical data. Given the struggle for reproducibility in various subdomains of psychology (Ioannidis, 2005; Szucs & Ioannidis, 2017; Yong, 2012), adopting effect sizes from published data involves the risk of performing the analyses on inflated effect sizes, which in turn would result in an underpowered design. Therefore, a way of protecting against such bias or uncertainty in the data used for simulation is desirable. One approach is choosing the smallest effect size of interest (SESOI) to run a power analysis—making it possible to design studies which are worthwhile to run, as they have a predetermined power to detect an effect that is of interest (Albers & Lakens, 2018). This requires knowledge of what an effect “just large enough to be worth discovering” looks like and how to express it in the appropriate numerical scale of the model.
Determining the SESOI for (G)LMMs is difficult in a simulation-based approach where effect sizes are indicated through the model’s unstandardized beta coefficients. While Westfall et al. (2014) introduced a method for calculating effect sizes for designs with one fixed and two random effects (also see Judd, Westfall, & Kenny, 2017), relating effect sizes to beta coefficients in complex models is far from trivial, and the authors therefore refrain from making specific recommendations. Instead, we wish to highlight an approach introduced in Brysbaert and Stevens (2018). To change the effect size, the authors directly manipulated the data used to inform the power analysis (e.g. by adding a constant to the reaction time in a certain condition). Refitting the model with the manipulated data can then provide information on how such a change is reflected in the beta coefficients. However, we do acknowledge that this approach is likely not applicable in all use cases and that more work is needed to establish informed decision-making for SESOIs in (G)LMMs. Until then, guidance can come from previous research, literature, or practical constraints (for a more detailed discussion see Lakens, Scheel, & Isager, 2018). Additionally, repeating a power simulation for different plausible effect sizes that are not necessarily the SESOI is worthwhile, as this allows us to examine how sensitive a design’s power is to such changes, and to develop better intuition for the resulting power of different plausible scenarios.
Implementing this approach requires changing the beta coefficients in our model in order to run a SESOI power analysis. Coming back to the previous example and FLPmodel, SESOIs for the specified effects need to be selected and integrated into the model—allowing us to vary sample size and effect size simultaneously. The default values in mixedpower() (i.e. SESOI = False, databased = True), which we previously used when estimating “power_FLP” earlier, include the data-based (i.e. effects found in data) but not a SESOI simulation. To include a SESOI simulation, mixedpower() function can be handed a vector with SESOIs in the form of the desired beta coefficients for all specified fixed effects using the SESOI argument. Here, we default to a simple justification strategy of reducing all beta coefficients by 15%. Since we already computed a data-based simulation in “power_FLP”, we additionally set databased = False to make the next simulation as efficient as possible.
As can be seen in Fig. 3, which combines the data-based simulation in “power_FLP” and the SESOI simulation in “power_SESOI”, simulations based on the SESOIs expectedly lead to more conservative estimates for the effect of complexity and the interaction between complexity and word length, while the main effect of word length is highly powered even for the specified SESOI and all examined sample sizes.